1
|
Suominen H, Johnson M, Zhou L, Sanchez P, Sirel R, Basilakis J, Hanlen L, Estival D, Dawson L, Kelly B. Capturing patient information at nursing shift changes: methodological evaluation of speech recognition and information extraction. J Am Med Inform Assoc 2015; 22:e48-66. [PMID: 25336589 PMCID: PMC5901121 DOI: 10.1136/amiajnl-2014-002868] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 08/21/2014] [Accepted: 10/02/2014] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE We study the use of speech recognition and information extraction to generate drafts of Australian nursing-handover documents. METHODS Speech recognition correctness and clinicians' preferences were evaluated using 15 recorder-microphone combinations, six documents, three speakers, Dragon Medical 11, and five survey/interview participants. Information extraction correctness evaluation used 260 documents, six-class classification for each word, two annotators, and the CRF++ conditional random field toolkit. RESULTS A noise-cancelling lapel-microphone with a digital voice recorder gave the best correctness (79%). This microphone was also the most preferred option by all but one participant. Although the participants liked the small size of this recorder, their preference was for tablets that can also be used for document proofing and sign-off, among other tasks. Accented speech was harder to recognize than native language and a male speaker was detected better than a female speaker. Information extraction was excellent in filtering out irrelevant text (85% F1) and identifying text relevant to two classes (87% and 70% F1). Similarly to the annotators' disagreements, there was confusion between the remaining three classes, which explains the modest 62% macro-averaged F1. DISCUSSION We present evidence for the feasibility of speech recognition and information extraction to support clinicians' in entering text and unlock its content for computerized decision-making and surveillance in healthcare. CONCLUSIONS The benefits of this automation include storing all information; making the drafts available and accessible almost instantly to everyone with authorized access; and avoiding information loss, delays, and misinterpretations inherent to using a ward clerk or transcription services.
Collapse
Affiliation(s)
- Hanna Suominen
- Machine Learning Research Group, NICTA, College of Engineering and Computer Science, The Australian National University, Faculty of Health, University of Canberra, and Department of Information Technology, University of Turku, Canberra, Australian Capital Territory, Australia
| | - Maree Johnson
- Research Faculty of Health Sciences, Australian Catholic University, Sydney, New South Wales, Australia
| | - Liyuan Zhou
- Machine Learning Research Group, NICTA, Canberra, Australian Capital Territory, Australia
| | - Paula Sanchez
- Centre for Applied Nursing Research (University of Western Sydney and South Western Sydney Local Health District), Sydney, New South Wales, Australia
| | - Raul Sirel
- Institute of Estonian and General Linguistics, University of Tartu, Tartu, Estonia
| | - Jim Basilakis
- School of Computing, Engineering and Mathematics, University of Western Sydney, Sydney, New South Wales, Australia
| | - Leif Hanlen
- Machine Learning Research Group, NICTA, College of Engineering and Computer Science, The Australian National University, Faculty of Health, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Dominique Estival
- The MARCS Institute, University of Western Sydney and Department of Linguistics, University of Sydney, Sydney, New South Wales, Australia
| | - Linda Dawson
- Faculty of Social Sciences, University of Wollongong, Wollongong, New South Wales, Australia
| | - Barbara Kelly
- School of Languages and Linguistics, The University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
2
|
Söhngen C, Chang A, Schomburg D. Development of a classification scheme for disease-related enzyme information. BMC Bioinformatics 2011; 12:329. [PMID: 21827651 PMCID: PMC3166944 DOI: 10.1186/1471-2105-12-329] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 08/09/2011] [Indexed: 11/24/2022] Open
Abstract
Background BRENDA (BRaunschweig ENzyme DAtabase, http://www.brenda-enzymes.org) is a major resource for enzyme related information. First and foremost, it provides data which are manually curated from the primary literature. DRENDA (Disease RElated ENzyme information DAtabase) complements BRENDA with a focus on the automatic search and categorization of enzyme and disease related information from title and abstracts of primary publications. In a two-step procedure DRENDA makes use of text mining and machine learning methods. Results Currently enzyme and disease related references are biannually updated as part of the standard BRENDA update. 910,897 relations of EC-numbers and diseases were extracted from titles or abstracts and are included in the second release in 2010. The enzyme and disease entity recognition has been successfully enhanced by a further relation classification via machine learning. The classification step has been evaluated by a 5-fold cross validation and achieves an F1 score between 0.802 ± 0.032 and 0.738 ± 0.033 depending on the categories and pre-processing procedures. In the eventual DRENDA content every category reaches a classification specificity of at least 96.7% and a precision that ranges from 86-98% in the highest confidence level, and 64-83% for the smallest confidence level associated with higher recall. Conclusions The DRENDA processing chain analyses PubMed, locates references with disease-related information on enzymes and categorises their focus according to the categories causal interaction, therapeutic application, diagnostic usage and ongoing research. The categorisation gives an impression on the focus of the located references. Thus, the relation categorisation can facilitate orientation within the rapidly growing number of references with impact on diseases and enzymes. The DRENDA information is available as additional information in BRENDA.
Collapse
Affiliation(s)
- Carola Söhngen
- Technische Universität Braunschweig, Department of Bioinformatics and Biochemistry Langer Kamp 19 B, 38106 Braunschweig, Germany
| | | | | |
Collapse
|
3
|
Zhang L, Berleant D, Ding J, Cao T, Syrkin Wurtele E. PathBinder--text empirics and automatic extraction of biomolecular interactions. BMC Bioinformatics 2009; 10 Suppl 11:S18. [PMID: 19811683 PMCID: PMC3226189 DOI: 10.1186/1471-2105-10-s11-s18] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Motivation The increasingly large amount of free, online biological text makes automatic interaction extraction correspondingly attractive. Machine learning is one strategy that works by uncovering and using useful properties that are implicit in the text. However these properties are usually not reported in the literature explicitly. By investigating specific properties of biological text passages in this paper, we aim to facilitate an alternative strategy, the use of text empirics, to support mining of biomedical texts for biomolecular interactions. We report on our application of this approach, and also report some empirical findings about an important class of passages. These may be useful to others who may also wish to use the empirical properties we describe. Results We manually analyzed syntactic and semantic properties of sentences likely to describe interactions between biomolecules. The resulting empirical data were used to design an algorithm for the PathBinder system to extract biomolecular interactions from texts. PathBinder searches PubMed for sentences describing interactions between two given biomolecules. PathBinder then uses probabilistic methods to combine evidence from multiple relevant sentences in PubMed to assess the relative likelihood of interaction between two arbitrary biomolecules. A biomolecular interaction network was constructed based on those likelihoods. Conclusion The text empirics approach used here supports computationally friendly, performance competitive, automatic extraction of biomolecular interactions from texts. Availability http://www.metnetdb.org/pathbinder.
Collapse
|
4
|
Abstract
Summary: Accurate semantic classification is valuable for text mining and knowledge-based tasks that perform inference based on semantic classes. To benefit applications using the semantic classification of the Unified Medical Language System (UMLS) concepts, we automatically reclassified the concepts based on their lexical and contextual features. The new classification is useful for auditing the original UMLS semantic classification and for building biomedical text mining applications. Availability:http://www.dbmi.columbia.edu/~juf7002/reclassify_production Contact:fan@dbmi.columbia.edu Supplementary information: Supplementary data is available at http://www.dbmi.columbia.edu/~juf7002/reclassify_production.
Collapse
Affiliation(s)
- Jung-Wei Fan
- Department of Biomedical Informatics, Columbia University, 622 W 168th St, VC5, New York, NY10032, USA.
| | | |
Collapse
|
5
|
Kotera M, McDonald AG, Boyce S, Tipton KF. Functional group and substructure searching as a tool in metabolomics. PLoS One 2008; 3:e1537. [PMID: 18253485 PMCID: PMC2212108 DOI: 10.1371/journal.pone.0001537] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Accepted: 01/06/2008] [Indexed: 01/31/2023] Open
Abstract
Background A direct link between the names and structures of compounds and the functional groups contained within them is important, not only because biochemists frequently rely on literature that uses a free-text format to describe functional groups, but also because metabolic models depend upon the connections between enzymes and substrates being known and appropriately stored in databases. Methodology We have developed a database named “Biochemical Substructure Search Catalogue” (BiSSCat), which contains 489 functional groups, >200,000 compounds and >1,000,000 different computationally constructed substructures, to allow identification of chemical compounds of biological interest. Conclusions This database and its associated web-based search program (http://bisscat.org/) can be used to find compounds containing selected combinations of substructures and functional groups. It can be used to determine possible additional substrates for known enzymes and for putative enzymes found in genome projects. Its applications to enzyme inhibitor design are also discussed.
Collapse
Affiliation(s)
- Masaaki Kotera
- School of Biochemistry and Immunology, Trinity College, Dublin, Ireland.
| | | | | | | |
Collapse
|
6
|
Osborne JD, Lin S, Zhu L, Kibbe WA. Mining biomedical data using MetaMap Transfer (MMtx) and the Unified Medical Language System (UMLS). Methods Mol Biol 2007; 408:153-169. [PMID: 18314582 DOI: 10.1007/978-1-59745-547-3_9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Detailed instruction is described for mapping unstructured, free text data into common biomedical concepts (drugs, diseases, anatomy, and so on) found in the Unified Medical Language System using MetaMap Transfer (MMTx). MMTx can be used in applications including mining and inferring relationship between concepts in MEDLINE publications by transforming free text into computable concepts. MMTx is in general not designed to be an end-user program; therefore, a simple analysis is described using MMTx for users without any programming knowledge. In addition, two Java template files are provided for automated processing of the output from MMTx and users can adopt this with minimum Java program experience.
Collapse
Affiliation(s)
- John D Osborne
- Robert H. Lurie Cancer Center, Northwestern University, Chicago, IL, USA
| | | | | | | |
Collapse
|
7
|
Karopka T, Fluck J, Mevissen HT, Glass Ä. The Autoimmune Disease Database: a dynamically compiled literature-derived database. BMC Bioinformatics 2006; 7:325. [PMID: 16803617 PMCID: PMC1525205 DOI: 10.1186/1471-2105-7-325] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2005] [Accepted: 06/27/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Autoimmune diseases are disorders caused by an immune response directed against the body's own organs, tissues and cells. In practice more than 80 clinically distinct diseases, among them systemic lupus erythematosus and rheumatoid arthritis, are classified as autoimmune diseases. Although their etiology is unclear these diseases share certain similarities at the molecular level i.e. susceptibility regions on the chromosomes or the involvement of common genes. To gain an overview of these related diseases it is not feasible to do a literary review but it requires methods of automated analyses of the more than 500,000 Medline documents related to autoimmune disorders. RESULTS In this paper we present the first version of the Autoimmune Disease Database which to our knowledge is the first comprehensive literature-based database covering all known or suspected autoimmune diseases. This dynamically compiled database allows researchers to link autoimmune diseases to the candidate genes or proteins through the use of named entity recognition which identifies genes/proteins in the corresponding Medline abstracts. The Autoimmune Disease Database covers 103 autoimmune disease concepts. This list was expanded to include synonyms and spelling variants yielding a list of over 1,200 disease names. The current version of the database provides links to 541,690 abstracts and over 5,000 unique genes/proteins. CONCLUSION The Autoimmune Disease Database provides the researcher with a tool to navigate potential gene-disease relationships in Medline abstracts in the context of autoimmune diseases.
Collapse
Affiliation(s)
- Thomas Karopka
- Institute for Medical Informatics and Biometry, University of Rostock, Rembrandt-Str. 16/17, 18055 Rostock, Germany
| | - Juliane Fluck
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany
| | - Heinz-Theodor Mevissen
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany
| | - Änne Glass
- Institute for Medical Informatics and Biometry, University of Rostock, Rembrandt-Str. 16/17, 18055 Rostock, Germany
| |
Collapse
|
8
|
Fluck J, Zimmermann M, Kurapkat G, Hofmann M. Information extraction technologies for the life science industry. DRUG DISCOVERY TODAY. TECHNOLOGIES 2005; 2:217-224. [PMID: 24981939 DOI: 10.1016/j.ddtec.2005.08.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Access to relevant information and knowledge is essential for all steps of the drug discovery process. However, keeping track of relevant information in publications and patents becomes a real challenge for scientists and managers in industrial research. Computer-aided information extraction (IE) systems have been developed to support the work of scientists by extracting relevant information from scientific publications and presenting it in an aggregated, condensed form. In this review, we will give an overview on current information extraction strategies in the life sciences with a special focus on biological entity recognition and more recent developments towards the identification and extraction of chemical compound names and structures.:
Collapse
Affiliation(s)
- Juliane Fluck
- Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany.
| | - Marc Zimmermann
- Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany
| | - Günther Kurapkat
- TEMIS Deutschland GmbH, Kurfürstenanlage 3, 69115 Heidelberg, Germany
| | - Martin Hofmann
- Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany
| |
Collapse
|