Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing.
J Biomed Inform 2014;
52:386-93. [PMID:
25117751 DOI:
10.1016/j.jbi.2014.08.001]
[Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 07/01/2014] [Accepted: 08/01/2014] [Indexed: 11/17/2022]
Abstract
In this paper we describe an efficient tool based on natural language processing for classifying the detail state of pulmonary embolism (PE) recorded in CT pulmonary angiography reports. The classification tasks include: PE present vs. absent, acute PE vs. others, central PE vs. others, and subsegmental PE vs. others. Statistical learning algorithms were trained with features extracted using the NLP tool and gold standard labels obtained via chart review from two radiologists. The areas under the receiver operating characteristic curves (AUC) for the four tasks were 0.998, 0.945, 0.987, and 0.986, respectively. We compared our classifiers with bag-of-words Naive Bayes classifiers, a standard text mining technology, which gave AUC 0.942, 0.765, 0.766, and 0.712, respectively.
Collapse