1
|
Peterson E, May FP, Kachikian O, Soroudi C, Naini B, Kang Y, Myint A, Guyant G, Elmore J, Bastani R, Maehara C, Hsu W. Automated identification and assignment of colonoscopy surveillance recommendations for individuals with colorectal polyps. Gastrointest Endosc 2021; 94:978-987. [PMID: 34087201 DOI: 10.1016/j.gie.2021.05.036] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/24/2021] [Indexed: 02/08/2023]
Abstract
BACKGROUND AND AIMS Determining surveillance intervals for patients with colorectal polyps is critical but time-consuming and challenging to do reliably. We present the development and assessment of a pipeline that leverages natural language processing techniques to automatically extract and analyze relevant polyp findings from free-text colonoscopy and pathology reports. Using this information, we categorized individual patients into 6 postcolonoscopy surveillance intervals defined by the U.S. Multi-Society Task Force on Colorectal Cancer. METHODS Using a set of 546 randomly selected colonoscopy and pathology reports from 324 patients in a single health system, we used a combination of statistical classifiers and rule-based methods to extract polyp properties from each report type, associate properties with unique polyps, and classify a patient into 1 of 6 risk categories by integrating information from both report types. We then assessed the pipeline's performance by determining the positive predictive value (PPV), sensitivity, and F-score of the algorithm, compared with the determination of surveillance intervals by a gastroenterologist. RESULTS The pipeline was developed using 346 reports (224 colonoscopy and 122 pathology) from 224 patients and evaluated on an independent test set of 200 reports (100 colonoscopy and 100 pathology) from 100 patients. We achieved an average PPV, sensitivity, and F-score of .92, .95, and .93, respectively, across targeted entities for colonoscopy. Pathology extraction achieved a PPV, sensitivity, and F-score of .95, .97, and .96. The system achieved an overall accuracy of 92% in assigning the recommended interval for surveillance colonoscopy. CONCLUSIONS This study demonstrates the feasibility of using machine learning to automatically extract findings and classify patients to appropriate risk categories and corresponding surveillance intervals. Incorporating this system can facilitate proactive and timely follow-up after screening colonoscopy and enable real-time quality assessment of prevention programs and providers.
Collapse
Affiliation(s)
- Emma Peterson
- Department of Radiological Sciences, Data Integration, Architecture, and Analytics Group, University of California Los Angeles, Los Angeles, California, USA
| | - Folasade P May
- Department of Medicine, Vatche and Tamar Manoukian Division of Digestive Diseases, Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA; UCLA Center for Cancer Prevention and Control Research, UCLA Kaiser Permanente Center for Health Equity and Department of Health Policy and Management, Fielding School of Public Health and Jonsson Comprehensive Cancer Center, Los Angeles, California, USA; Division of Gastroenterology, Department of Medicine, Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, California, USA
| | - Odet Kachikian
- Department of Radiological Sciences, Data Integration, Architecture, and Analytics Group, University of California Los Angeles, Los Angeles, California, USA
| | - Camille Soroudi
- Department of Medicine, Vatche and Tamar Manoukian Division of Digestive Diseases, Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Bita Naini
- Department of Pathology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Yuna Kang
- Department of Pathology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Anthony Myint
- Department of Medicine, Vatche and Tamar Manoukian Division of Digestive Diseases, Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Gordon Guyant
- Department of Radiological Sciences, Data Integration, Architecture, and Analytics Group, University of California Los Angeles, Los Angeles, California, USA
| | - Joann Elmore
- Department of Medicine, Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Roshan Bastani
- UCLA Center for Cancer Prevention and Control Research, UCLA Kaiser Permanente Center for Health Equity and Department of Health Policy and Management, Fielding School of Public Health and Jonsson Comprehensive Cancer Center, Los Angeles, California, USA
| | - Cleo Maehara
- Department of Radiological Sciences, Data Integration, Architecture, and Analytics Group, University of California Los Angeles, Los Angeles, California, USA
| | - William Hsu
- Department of Radiological Sciences, Data Integration, Architecture, and Analytics Group, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|