1
|
Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G, Jacobson RS. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Res 2017; 77:e115-e118. [PMID: 29092954 DOI: 10.1158/0008-5472.can-17-0615] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 07/20/2017] [Accepted: 10/02/2017] [Indexed: 11/16/2022]
Abstract
Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from electronic medical records of cancer patients. The system implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually annotated dataset of the University of Pittsburgh Medical Center breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment. Cancer Res; 77(21); e115-8. ©2017 AACR.
Collapse
Affiliation(s)
- Guergana K Savova
- Boston Children's Hospital, Boston, Massachusetts. .,Harvard Medical School, Boston, Massachusetts
| | - Eugene Tseytlin
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Sean Finan
- Boston Children's Hospital, Boston, Massachusetts
| | - Melissa Castine
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Timothy Miller
- Boston Children's Hospital, Boston, Massachusetts.,Harvard Medical School, Boston, Massachusetts
| | - Olga Medvedeva
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - David Harris
- Boston Children's Hospital, Boston, Massachusetts
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Chen Lin
- Boston Children's Hospital, Boston, Massachusetts
| | - Girish Chavan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Rebecca S Jacobson
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania.,University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| |
Collapse
|
2
|
Tseytlin E, Mitchell K, Legowski E, Corrigan J, Chavan G, Jacobson RS. NOBLE - Flexible concept recognition for large-scale biomedical natural language processing. BMC Bioinformatics 2016; 17:32. [PMID: 26763894 PMCID: PMC4712516 DOI: 10.1186/s12859-015-0871-y] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 12/22/2015] [Indexed: 11/24/2022] Open
Abstract
Background Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system’s matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. Results We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE’s performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. Conclusion NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines.
Collapse
Affiliation(s)
- Eugene Tseytlin
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, The Offices at Baum, 5607 Baum Boulevard, BAUM 423, Rm 523, Pittsburgh, PA, 15206-3701, USA.
| | - Kevin Mitchell
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, The Offices at Baum, 5607 Baum Boulevard, BAUM 423, Rm 523, Pittsburgh, PA, 15206-3701, USA.
| | - Elizabeth Legowski
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, The Offices at Baum, 5607 Baum Boulevard, BAUM 423, Rm 523, Pittsburgh, PA, 15206-3701, USA.
| | - Julia Corrigan
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, The Offices at Baum, 5607 Baum Boulevard, BAUM 423, Rm 523, Pittsburgh, PA, 15206-3701, USA.
| | - Girish Chavan
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, The Offices at Baum, 5607 Baum Boulevard, BAUM 423, Rm 523, Pittsburgh, PA, 15206-3701, USA.
| | - Rebecca S Jacobson
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, The Offices at Baum, 5607 Baum Boulevard, BAUM 423, Rm 523, Pittsburgh, PA, 15206-3701, USA.
| |
Collapse
|
3
|
Jacobson RS, Becich MJ, Bollag RJ, Chavan G, Corrigan J, Dhir R, Feldman MD, Gaudioso C, Legowski E, Maihle NJ, Mitchell K, Murphy M, Sakthivel M, Tseytlin E, Weaver J. A Federated Network for Translational Cancer Research Using Clinical Data and Biospecimens. Cancer Res 2015; 75:5194-201. [PMID: 26670560 PMCID: PMC4683415 DOI: 10.1158/0008-5472.can-15-1973] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Advances in cancer research and personalized medicine will require significant new bridging infrastructures, including more robust biorepositories that link human tissue to clinical phenotypes and outcomes. In order to meet that challenge, four cancer centers formed the Text Information Extraction System (TIES) Cancer Research Network, a federated network that facilitates data and biospecimen sharing among member institutions. Member sites can access pathology data that are de-identified and processed with the TIES natural language processing system, which creates a repository of rich phenotype data linked to clinical biospecimens. TIES incorporates multiple security and privacy best practices that, combined with legal agreements, network policies, and procedures, enable regulatory compliance. The TIES Cancer Research Network now provides integrated access to investigators at all member institutions, where multiple investigator-driven pilot projects are underway. Examples of federated search across the network illustrate the potential impact on translational research, particularly for studies involving rare cancers, rare phenotypes, and specific biologic behaviors. The network satisfies several key desiderata including local control of data and credentialing, inclusion of rich phenotype information, and applicability to diverse research objectives. The TIES Cancer Research Network presents a model for a national data and biospecimen network.
Collapse
Affiliation(s)
| | - Michael J Becich
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | - Roni J Bollag
- Georgia Regents University Cancer Center, Augusta, Georgia
| | - Girish Chavan
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | - Julia Corrigan
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | - Rajiv Dhir
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | - Michael D Feldman
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | | | - Nita J Maihle
- Georgia Regents University Cancer Center, Augusta, Georgia
| | - Kevin Mitchell
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | | | | | - Eugene Tseytlin
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | - JoEllen Weaver
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
4
|
Crowley RS, Castine M, Mitchell K, Chavan G, McSherry T, Feldman M. caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J Am Med Inform Assoc 2010; 17:253-64. [PMID: 20442142 DOI: 10.1136/jamia.2009.002295] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The authors report on the development of the Cancer Tissue Information Extraction System (caTIES)--an application that supports collaborative tissue banking and text mining by leveraging existing natural language processing methods and algorithms, grid communication and security frameworks, and query visualization methods. The system fills an important need for text-derived clinical data in translational research such as tissue-banking and clinical trials. The design of caTIES addresses three critical issues for informatics support of translational research: (1) federation of research data sources derived from clinical systems; (2) expressive graphical interfaces for concept-based text mining; and (3) regulatory and security model for supporting multi-center collaborative research. Implementation of the system at several Cancer Centers across the country is creating a potential network of caTIES repositories that could provide millions of de-identified clinical reports to users. The system provides an end-to-end application of medical natural language processing to support multi-institutional translational research programs.
Collapse
Affiliation(s)
- Rebecca S Crowley
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15232, USA.
| | | | | | | | | | | |
Collapse
|
5
|
Saadawi GM, Legowski E, Medvedeva O, Chavan G, Crowley RS. A method for automated detection of usability problems from client user interface events. AMIA Annu Symp Proc 2005; 2005:654-8. [PMID: 16779121 PMCID: PMC1560804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Think-aloud usability analysis provides extremely useful data but is very time-consuming and expensive to perform because of the extensive manual video analysis that is required. We describe a simple method for automated detection of usability problems from client user interface events for a developing medical intelligent tutoring system. The method incorporates (1) an agent-based method for communication that funnels all interface events and system responses to a centralized database, (2) a simple schema for representing interface events and higher order subgoals, and (3) an algorithm that reproduces the criteria used for manual coding of usability problems. A correction factor was empirically determining to account for the slower task performance of users when thinking aloud. We tested the validity of the method by simultaneously identifying usability problems using TAU and manually computing them from stored interface event data using the proposed algorithm. All usability problems that did not rely on verbal utterances were detectable with the proposed method.
Collapse
Affiliation(s)
- Gilan M Saadawi
- Center for Biomedical Informatics, University of PIttsburgh School of Medicine, Pittsburgh, PA, USA
| | | | | | | | | |
Collapse
|
6
|
Asrani A, Chavan G, Jain J. A triad of radiologic signs. Tuberculosis of the 1st metatarsophalangeal joint. J Postgrad Med 2002; 48:279, 289. [PMID: 12613475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Affiliation(s)
- A Asrani
- Department of Radiology, Seth G.S. Medical College and K.E.M. Hospital, Parel, Mumbai, India.
| | | | | |
Collapse
|