1
|
Abstract
AbstractOver the past two decades, the construction of models for medical concept representation and for understanding of the deep meaning of medical narrative texts have been challenging areas of medical informatics research. This review highlights how these two inter-related domains have evolved, emphasizing aspects of medical modeling as a tool for medical language understanding. A representation schema, which balances partially but accurately with complete but complex representations of domainspecific knowledge, must be developed to facilitate language understanding. Representative examples are drawn from two major independent efforts undertaken by the authors: the elaboration and the subsequent adjustment of the RECIT multilingual analyzer to include a robust medical concept model, and the recasting of a frame-based interlingua system, originally developed to map equivalent concepts between controlled clinical vocabularies, to invoke a similar concept model.
Collapse
|
2
|
Abstract
AbstractDefinitions are provided of the key entities in knowledge representation for Natural Language Processing (NLP). Starting from the words, which are the natural components of any sentence, both the role of expressions and the decomposition of words into their parts are emphasized. This leads to the notion of concepts, which are either primitive or composite depending on the model where they are created. The problem of finding the most adequate degree of granularity for a concept is studied. From this reflection on basic Natural Language Processing components, four categories of linguistic knowledge are recognized, that are considered to be the building blocks of a Medical Linguistic Knowledge Base (MLKB). Following on the tracks of a recent experience in building a natural language-based patient encoding browser, a robust method for conceptual indexing and query of medical texts is presented with particular attention to the scheme of knowledge representation.
Collapse
|
3
|
Abstract
Abstract:For medical records, the challenge for the present decade is Natural Language Processing (NLP) of texts, and the construction of an adequate Knowledge Representation. This article describes the components of an NLP system, which is currently being developed in the Geneva Hospital, and within the European Community’s AIM programme. They are: a Natural Language Analyser, a Conceptual Graphs Builder, a Data Base Storage component, a Query Processor, a Natural Language Generator and, in addition, a Translator, a Diagnosis Encoding System and a Literature Indexing System. Taking advantage of a closed domain of knowledge, defined around a medical specialty, a method called proximity processing has been developed. In this situation no parser of the initial text is needed, and the system is based on semantical information of near words in sentences. The benefits are: easy implementation, portability between languages, robustness towards badly-formed sentences, and a sound representation using conceptual graphs.
Collapse
|
4
|
Baud RH, Nyström M, Borin L, Evans R, Schulz S, Zweigenbaum P. Interchanging lexical information for a multilingual dictionary. AMIA Annu Symp Proc 2005; 2005:31-5. [PMID: 16778996 PMCID: PMC1560452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
OBJECTIVE To facilitate the interchange of lexical information for multiple languages in the medical domain. To pave the way for the emergence of a generally available truly multilingual electronic dictionary in the medical domain. METHODS An interchange format has to be neutral relative to the target languages. It has to be consistent with current needs of lexicon authors, present and future. An active interaction between six potential authors aimed to determine a common denominator striking the right balance between richness of content and ease of use for lexicon providers. RESULTS A simple list of relevant attributes has been established and published. The format has the potential for collecting relevant parts of a future multilingual dictionary. An XML version is available. CONCLUSION This effort makes feasible the exchange of lexical information between research groups. Interchange files are made available in a public repository. This procedure opens the door to a true multilingual dictionary, in the awareness that the exchange of lexical information is (only) a necessary first step, before structuring the corresponding entries in different languages.
Collapse
Affiliation(s)
- R H Baud
- Service of Medical Informatics, University Hospitals of Geneva, Switzerland
| | | | | | | | | | | |
Collapse
|
5
|
Lovis C, Baud RH, Revillard C, Pult L, Borst F, Geissbuhler A. Paragraph-oriented structure for narratives in medical documentation. Stud Health Technol Inform 2002; 84:638-42. [PMID: 11604815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
Abstract
The authors present a 6 years experiment using a document- centered electronic patient record, based on a central document repository. The document management system is paragraph oriented and all documents are built automatically before editing using predefined ordered sets of para-graphs. Paragraphs can be preloaded with templates, text or images. Once edited, signed and printed, documents are again decomposed in paragraphs and permanently stored. This system, though the compositional aspect of paragraphs is limited and their semantic content wide, offers numerous advantages. The typology is easy to build and to maintain, it has been implemented widely in our hospitals without need for any natural language processing techniques and is used daily within commercially available text editors. The actual state of the system is discussed, emphasizing the structure of the documents, the various attributes and properties that have been needed in order to meet user's needs.
Collapse
Affiliation(s)
- C Lovis
- Division of Medical Informatics, University Hospital of Geneva, 1211 Geneva, Switzerland.
| | | | | | | | | | | |
Collapse
|
6
|
Baud RH, Lovis C, Ruch P, Rassinoux AM. Conceptual search in electronic patient record. Stud Health Technol Inform 2002; 84:156-60. [PMID: 11604724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
Abstract
Search by content in a large corpus of free texts in the medical domain is, today, only partially solved. The so-called GREP approach (Get Regular Expression and Print), based on highly efficient string matching techniques, is subject to inherent limitations, especially its inability to recognize domain specific knowledge. Such methods oblige the user to formulate his or her query in a logical Boolean style; if this constraint is not fulfilled, the results are poor. The authors present an enhancement to string matching search by the addition of a light conceptual model behind the word lexicon. The new system accepts any sentence as a query and radically improves the quality of results. Efficiency regarding execution time is obtained at the expense of implementing advanced indexing algorithms in a pre-processing phase. The method is described and commented and a brief account of the results illustrates this paper.
Collapse
Affiliation(s)
- R H Baud
- Medical Informatics Division, University Hospital of Geneva, Switzerland.
| | | | | | | |
Collapse
|
7
|
Baud RH, Lovis C, Rassinoux AM, Ruch P, Geissbuhler A. Controlling the vocabulary for anatomy. Proc AMIA Symp 2002:26-30. [PMID: 12463780 PMCID: PMC2244507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023] Open
Abstract
When confronted with the representation of human anatomy, natural language processing (NLP) system designers are facing an unsolved and frequent problem: the lack of a suitable global reference. The available sources in electronic format are numerous, but none fits adequately all the constraints and needs of language analysis. These sources are usually incomplete, difficult to use or tailored to specific needs. The anatomist's or ontologist's view does not necessarily match that of the linguist. The purpose of this paper is to review most recognized sources of knowledge in anatomy usable for linguistic analysis. Their potential and limits are emphasized according to this point of view. Focus is given on the role of the consensus work of the International Federation of Associations of Anatomists (IFAA) giving the Terminologia Anatomica.
Collapse
Affiliation(s)
- R H Baud
- Division d'Informatique Medicale, University Hospital of Geneva, CH - 1211 Geneva 14, Switzerland
| | | | | | | | | |
Collapse
|
8
|
Baud RH, Lovis C, Ruch P, Rassinoux AM. A toolset for medical text processing. Stud Health Technol Inform 2001; 77:456-61. [PMID: 11187593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
The processing of medical texts is a burden in the absence of a toolset designed for simple operations such as recognizing morphological variants, updating and accessing a word dictionary of the domain and segmenting words with multiple morpho-semantems. The apparent simplicity of these basic operations is an illusion because it soon becomes clear that quality implementation is a long-term task. Coherency between subtasks may be lacking unless strict rules are enforced. In fact, good tools are rarely available or have not been tailored for the medical profession. This paper aims at defining a complete toolset for medical word processing. In addition, it provides relevant examples of the inherent difficulties of this task. It reports on typical results that can be expected from an industry-standard implementation.
Collapse
Affiliation(s)
- R H Baud
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | |
Collapse
|
9
|
Baud RH, Lovis C, Ruch P, Rassinoux AM. A light knowledge model for linguistic applications. Proc AMIA Symp 2001:37-41. [PMID: 11833480 PMCID: PMC2243409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Content extraction from medical texts is achievable today by linguistic applications, in so far as sufficient domain knowledge is available. Such knowledge represents a model of the domain and is hard to collect with sufficient depth and good coverage, despite numerous attempts. To leverage this task is a priority in order to benefit from the awaited linguistic tools. The light model is designed with this goal in mind. Syntactic and lexical information are generally available with large lexicons. A domain model should add the necessary semantic information. The authors have designed a light knowledge model for the collection of semantic information on the basis of the recognized syntactical and lexical attributes. It has been tailored for the acquisition of enough semantic information in order to retrieve terms of a controlled vocabulary from free texts, as for example, to retrieve Mesh terms from patient records.
Collapse
Affiliation(s)
- R H Baud
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | |
Collapse
|
10
|
Ruch P, Baud RH, Geiddbühler A, Lovis C, Rassinoux AM, Rivière A. Looking back or looking all around: comparing two spell checking strategies for documents edition in an electronic patient record. Proc AMIA Symp 2001:568-72. [PMID: 11837217 PMCID: PMC2243278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
We report on the comparison of two systems for correcting spelling errors resulting in non-existent words (i.e. not listed in any lexicon). Both systems aim at improving edition of medical reports. Unlike traditional systems, based on word language models, both semantic and syntactic contexts are considered here. Both systems share the same string-to-string edit distance module, and the same contextual disambiguation principles. The differences between the two systems are located at the user interaction level: while the first system is using exclusively the left context, simulating the underlining of every mis-spelling at the end of every word typing, the second system uses the left as well as the right context and simulate a post-edition correction, when asked by the author. Our conclusion shows the improvements brought by the second approach.
Collapse
Affiliation(s)
- P Ruch
- Medical Informatics Division, University Hospital of Geneva, Swiztzerland.
| | | | | | | | | | | |
Collapse
|
11
|
Lovis C, Chapko MK, Martin DP, Payne TH, Baud RH, Hoey PJ, Fihn SD. Evaluation of a command-line parser-based order entry pathway for the Department of Veterans Affairs electronic patient record. J Am Med Inform Assoc 2001; 8:486-98. [PMID: 11522769 PMCID: PMC131046 DOI: 10.1136/jamia.2001.0080486] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE To improve and simplify electronic order entry in an existing electronic patient record, the authors developed an alternative system for entering orders, which is based on a command- interface using robust and simple natural-language techniques. DESIGN The authors conducted a randomized evaluation of the new entry pathway, measuring time to complete a standard set of orders, and users' satisfaction measured by questionnaire. A group of 16 physician volunteers from the staff of the Department of Veterans Affairs Puget Sound Health Care System-Seattle Division participated in the evaluation. RESULTS Thirteen of the 16 physicians (81%) were able to enter medical orders more quickly using the natural-language-based entry system than the standard graphical user interface that uses menus and dialogs (mean time spared, 16.06 +/- 4.52 minutes; P=0.029). Compared with the graphical user interface, the command--based pathway was perceived as easier to learn (P<0.01), was considered easier to use and faster (P<0.01), and was rated better overall (P<0.05). CONCLUSION Physicians found the command- interface easier to learn and faster to use than the usual menu-driven system. The major advantage of the system is that it combines an intuitive graphical user interface with the power and speed of a natural-language analyzer.
Collapse
Affiliation(s)
- C Lovis
- Veterans Affairs Puget Sound Health Care System, Seattle, Washington, USA.
| | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
This paper presents the authors' experience with the development and use of a document-centered electronic patient record (EPR) in a large teaching hospital. The development of the document-centered EPR began with the formulation of a set of critical hypotheses to facilitate both the continuation of the best medical practice and the implementation and use of the EPR. An alternate and more conventional approach - the data-centered EPR - is compared with the document-centered EPR. Various benefits and pitfalls are discussed. Finally, the choice was to offer both solutions in a tightly linked system. The need for an EPR which combines the document and data centered approaches is a reflection of the more general discussion of what the medical record will be in the future. All too often, the need for structured data conflicts with the need for free texts and the power of expression. It is not easy to evaluate the consequences of this initial decision. However, changing the foundations of the EPR after its implementation is difficult and expensive. Therefore, the selection of the correct orientation in a given hospital requires a broad-based discussion.
Collapse
Affiliation(s)
- C Lovis
- Health Services Research and Development Veterans Affairs, Puget Sound Health Care System, Seattle, WA, USA
| | | | | |
Collapse
|
13
|
Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G. Medical document anonymization with a semantic lexicon. Proc AMIA Symp 2000:729-33. [PMID: 11079980 PMCID: PMC2244050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023] Open
Abstract
We present an original system for locating and removing personally-identifying information in patient records. In this experiment, anonymization is seen as a particular case of knowledge extraction. We use natural language processing tools provided by the MEDTAG framework: a semantic lexicon specialized in medicine, and a toolkit for word-sense and morpho-syntactic tagging. The system finds 98-99% of all personally-identifying information.
Collapse
Affiliation(s)
- P Ruch
- Medical Informatics Division, University Hospital of Geneva, ISSCO, University of Geneva
| | | | | | | | | |
Collapse
|
14
|
Abstract
OBJECTIVE The authors consider the problem of exact string pattern matching using algorithms that do not require any preprocessing. To choose the most appropriate algorithm, distinctive features of the medical language must be taken into account. The characteristics of medical language are emphasized in this regard, the best algorithm of those reviewed is proposed, and detailed evaluations of time complexity for processing medical texts are provided. DESIGN The authors first illustrate and discuss the techniques of various string pattern-matching algorithms. Next, the source code and the behavior of representative exact string pattern-matching algorithms are presented in a comprehensive manner to promote their implementation. Detailed explanations of the use of various techniques to improve performance are given. MEASUREMENTS Real-time measures of time complexity with English medical texts are presented. They lead to results distinct from those found in the computer science literature, which are typically computed with normally distributed texts. RESULTS The Boyer-Moore-Horspool algorithm achieves the best overall results when used with medical texts. This algorithm usually performs at least twice as fast as the other algorithms tested. CONCLUSION The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorithm.
Collapse
Affiliation(s)
- C Lovis
- Puget Sound Health Care System, Seattle, Washington, USA.
| | | |
Collapse
|
15
|
Rassinoux AM, Ruch P, Baud RH, Lovis C. Semantic handling of medical compound words through sound analysis and generation processes. Proc AMIA Symp 2000:675-9. [PMID: 11079969 PMCID: PMC2243906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023] Open
Abstract
Compound words are frequently encountered in the medical domain. Their conciseness complies with the telegraphic style usually adopted by clinicians in daily practice. This amplifies the need for clarifying their semantic interpretation and representation through respectively the analysis and generation processes. While highlighting the peculiarities of medical compound words, this paper shows how model-driven linguistic tools accurately deal with the compositionality of medical language. These statements are illustrated by means of examples, stemming from the handling of surgical procedures as part of the GALEN-IN-USE project.
Collapse
Affiliation(s)
- A M Rassinoux
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | |
Collapse
|
16
|
Wagner JC, Rogers JE, Baud RH, Scherrer JR. Natural language generation of surgical procedures. Stud Health Technol Inform 1999; 52 Pt 1:591-5. [PMID: 10384523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
Abstract
The GALEN-IN-USE project has developed a compositional scheme for the conceptual representation of surgical operative procedure rubrics. The complex representations which result are translated back to surface language by a tool for multilingual natural language generation. This generator can be adapted to the specific characteristics of the scheme by introducing particular definitions of concepts and relationships. We discuss how the generator uses such definitions to bridge between the modelling 'style' of the GALEN scheme and natural language.
Collapse
Affiliation(s)
- J C Wagner
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | |
Collapse
|
17
|
Abstract
A number of compositional Medical Concept Representation systems are being developed. Although these provide for a detailed conceptual representation of the underlying information, they have to be translated back to natural language for used by end-users and applications. The GALEN programme has been developing one such representation and we report here on a tool developed to generate natural language phrases from the GALEN conceptual representations. This tool can be adapted to different source modelling schemes and to different destination languages or sublanguages of a domain. It is based on a multilingual approach to natural language generation, realised through a clean separation of the domain model from the linguistic model and their link by well defined structures. Specific knowledge structures and operations have been developed for bridging between the modelling 'style' of the conceptual representation and natural language. Using the example of the scheme developed for modelling surgical operative procedures within the GALEN-IN-USE project, we show how the generator is adapted to such a scheme. The basic characteristics of the surgical procedures scheme are presented together with the basic principles of the generation tool. Using worked examples, we discuss the transformation operations which change the initial source representation into a form which can more directly be translated to a given natural language. In particular, the linguistic knowledge which has to be introduced--such as definitions of concepts and relationships is described. We explain the overall generator strategy and how particular transformation operations are triggered by language-dependent and conceptual parameters. Results are shown for generated French phrases corresponding to surgical procedures from the urology domain.
Collapse
Affiliation(s)
- J C Wagner
- Medical Informatics Division, University Hospital of Geneva, Switzerland.
| | | | | | | |
Collapse
|
18
|
Ruch P, Wagner J, Bouillon P, Baud RH, Rassinoux AM, Scherrer JR. MEDTAG: tag-like semantics for medical document indexing. Proc AMIA Symp 1999:137-41. [PMID: 10566336 PMCID: PMC2232685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Open
Abstract
Medical documentation is central in health care, as it constitutes the main means of communication between care providers. However, there is a gap to bridge between storing information and extracting the relevant underlying knowledge. We believe natural language processing (NLP) is the best solution to handle such a large amount of textual information. In this paper we describe the construction of a semantic tagset for medical document indexing purposes. Rather than attempting to produce a home-made tagset, we decided to use, as far as possible, standard medicine resources. This step has led us to choose UMLS hierarchical classes as a basis for our tagset. We also show that semantic tagging is not only providing bases for disambiguisation between senses, but is also useful in the query expansion process of the retrieval system. We finally focus on assessing the results of the semantic tagger.
Collapse
Affiliation(s)
- P Ruch
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | | | | | |
Collapse
|
19
|
Baud RH, Rassinoux AM, Ruch P, Lovis C, Scherrer JR. The power and limits of a rule-based morpho-semantic parser. Proc AMIA Symp 1999:22-6. [PMID: 10566313 PMCID: PMC2232809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Open
Abstract
The venue of Electronic Patient Record (EPR) implies an increasing amount of medical texts readily available for processing, as soon as convenient tools are made available. The chief application is text analysis, from which one can drive other disciplines like indexing for retrieval, knowledge representation, translation and inferencing for medical intelligent systems. Prerequisites for a convenient analyzer of medical texts are: building the lexicon, developing semantic representation of the domain, having a large corpus of texts available for statistical analysis, and finally mastering robust and powerful parsing techniques in order to satisfy the constraints of the medical domain. This article aims at presenting an easy-to-use parser ready to be adapted in different settings. It describes its power together with its practical limitations as experienced by the authors.
Collapse
Affiliation(s)
- R H Baud
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | | | |
Collapse
|
20
|
Rassinoux AM, Baud RH, Ruch P, Trombert-Paviot B, Rodrigues JM. Model-based semantic dictionaries for medical language understanding. Proc AMIA Symp 1999:122-6. [PMID: 10566333 PMCID: PMC2232654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Open
Abstract
Semantic dictionaries are emerging as a major cornerstone towards achieving sound natural language understanding. Indeed, they constitute the main bridge between words and conceptual entities that reflect their meanings. Nowadays, more and more wide-coverage lexical dictionaries are electronically available in the public domain. However, associating a semantic content with lexical entries is not a straightforward task as it is subordinate to the existence of a fine-grained concept model of the treated domain. This paper presents the benefits and pitfalls in building and maintaining multilingual dictionaries, the semantics of which is directly established on an existing concept model. Concrete cases, handled through the GALEN-IN-USE project, illustrate the use of such semantic dictionaries for the analysis and generation of multilingual surgical procedures.
Collapse
Affiliation(s)
- A M Rassinoux
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | | | |
Collapse
|
21
|
Baud RH, Lovis C, Rassinoux AM, Scherrer JR. Alternative ways for knowledge collection, indexing and robust language retrieval. Methods Inf Med 1998; 37:315-26. [PMID: 9865029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
Definitions are provided of the key entities in knowledge representation for Natural Language Processing (NLP). Starting from the words, which are the natural components of any sentence, both the role of expressions and the decomposition of words into their parts are emphasized. This leads to the notion of concepts, which are either primitive or composite depending on the model where they are created. The problem of finding the most adequate degree of granularity for a concept is studied. From this reflection on basic Natural Language Processing components, four categories of linguistic knowledge are recognized, that are considered to be the building blocks of a Medical Linguistic Knowledge Base (MLKB). Following on the tracks of a recent experience in building a natural language-based patient encoding browser, a robust method for conceptual indexing and query of medical texts is presented with particular attention to the scheme of knowledge representation.
Collapse
Affiliation(s)
- R H Baud
- Division of Medical Informatics, Geneva University Hospital, Switzerland.
| | | | | | | |
Collapse
|
22
|
Rassinoux AM, Miller RA, Baud RH, Scherrer JR. Modeling concepts in medicine for medical language understanding. Methods Inf Med 1998; 37:361-72. [PMID: 9865034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
Over the past two decades, the construction of models for medical concept representation and for understanding of the deep meaning of medical narrative texts have been challenging areas of medical informatics research. This review highlights how these two inter-related domains have evolved, emphasizing aspects of medical modeling as a tool for medical language understanding. A representation schema, which balances partially but accurately with complete but complex representations of domain-specific knowledge, must be developed to facilitate language understanding. Representative examples are drawn from two major independent efforts undertaken by the authors: the elaboration and the subsequent adjustment of the RECIT multilingual analyzer to include a robust medical concept model, and the recasting of a frame-based interlingua system, originally developed to map equivalent concepts between controlled clinical vocabularies, to invoke a similar concept model.
Collapse
Affiliation(s)
- A M Rassinoux
- Division of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
| | | | | | | |
Collapse
|
23
|
Abstract
Healthcare enters the information age and professionals are finding an ever-growing role for computers in the daily practice of medicine. However, a number of problematic issues are associated with electronic publications, especially through Internet. Whilst access to any information has been improved, access to specific information has become more and more difficult [1], due to the lack of a general meta-knowledge allowing to structure Internet resources. Physicians have to learn and adapt themselves to computers and Internet, but Internet has to meet the specific requirements of Healthcare. Important issues must therefore be addressed to allow a real and daily use of Internet in the medical practice. The paper discusses most of these issues and proposes a solution developed at the University Hospital of Geneva that integrates an Electronic Patient Record with Internet, without compromises on security or on performances and that runs on standard PCs'.
Collapse
Affiliation(s)
- C Lovis
- Department of Internal Medicine, University Hospital of Geneva, Switzerland.
| | | | | |
Collapse
|
24
|
Rassinoux AM, Lovis C, Baud RH, Scherrer JR. Versatility of a multilingual and bi-directional approach for medical language processing. Proc AMIA Symp 1998:668-72. [PMID: 9929303 PMCID: PMC2232097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023] Open
Abstract
At the dawn of the 21st century, we are experiencing an exponential growth of online information that is mostly textual, and that benefits from new electronic media, such as the World Wide Web (WWW), to be broadly diffused across borders. However, there is a gap to bridge between holding information and accessing in a relevant way the deep underlying knowledge. Multilingual natural language processing (NLP), once tuned, is certainly the best solution to cope with this era of textual information. This paper focuses on the lesson learned through the joint development of an analyzer and a generator of medical language, within a multilingual context. Concrete examples, derived from the efforts under way in the European GALEN-IN-USE project, illustrate the use of these linguistic tools for the handling of surgical procedures.
Collapse
Affiliation(s)
- A M Rassinoux
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | |
Collapse
|
25
|
Baud RH, Lovis C, Rassinoux AM, Scherrer JR. Morpho-semantic parsing of medical expressions. Proc AMIA Symp 1998:760-4. [PMID: 9929321 PMCID: PMC2232116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023] Open
Abstract
The task of editing, indexing, storing, and retrieving medical expressions within medical records remains the main objective for the years to come. Therefore, the need for a parser with semantic capabilities able to robustly extract an essential part of the knowledge embedded in the medical record is paramount. The minimal requirements before considering clinical trials are that such a system has to be in position to handle any source of medical information and to conveniently grasp the main key concepts with low silence, good recognition of modalities and acceptable noise. This paper shows that the potential of morpho-semantic parsing is high to meet these conditions. This technique is an important complement to the traditional lexical approach and to expression-oriented systems like controlled vocabularies.
Collapse
Affiliation(s)
- R H Baud
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | |
Collapse
|
26
|
Rassinoux AM, Miller RA, Baud RH, Scherrer JR. Compositional and enumerative designs for medical language representation. Proc AMIA Annu Fall Symp 1997:620-4. [PMID: 9357700 PMCID: PMC2233357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Medical language is in essence highly compositional, allowing complex information to be expressed from more elementary pieces. Embedding the expressive power of medical language into formal systems of representation is recognized in the medical informatics community as a key step towards sharing such information among medical record, decision support, and information retrieval systems. Accordingly, such representation requires managing both the expressiveness of the formalism and its computational tractability, while coping with the level of detail expected by clinical applications. These desiderata can be supported by enumerative as well as compositional approaches, as argued in this paper. These principles have been applied in recasting a frame-based system for general medical findings developed during the 1980s. The new system captures the precise meaning of a subset of over 1500 medical terms for general internal medicine identified from the Quick Medical Reference (QMR) lexicon. In order to evaluate the adequacy of this formal structure in reflecting the deep meaning of the QMR findings, a validation process was implemented. It consists of automatically rebuilding the semantic representation of the QMR findings by analyzing them through the RECIT natural language analyzer, whose semantic components have been adjusted to this frame-based model for the understanding task.
Collapse
Affiliation(s)
- A M Rassinoux
- Division of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | | | | | | |
Collapse
|
27
|
Baud RH, Rassinoux AM, Lovis C, Wagner J, Griesser V, Michel PA, Scherrer JR. Knowledge sources for Natural Language Processing. Proc AMIA Annu Fall Symp 1996:70-4. [PMID: 8947630 PMCID: PMC2233211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
This paper aims at reviewing the problem of feeding Natural Language Processing (NLP) tools with convenient linguistic knowledge in the medical domain. A syntactic approach lacks the potential to solve a number of typical situations with ambiguities and is clearly insufficient for quality treatment of natural language. On the other hand, a conceptual approach relies on some modelling of the domain, of which the elaboration is d long-term process and where the ultimate solutions are far from being recognised and universally accepted. In-between is the beauty of the compromise. How can we significantly improve the coverage of linguistic knowledge in the years to come?
Collapse
Affiliation(s)
- R H Baud
- Division d'Informatique Médicale, University Hospital of Geneva, Switzerland
| | | | | | | | | | | | | |
Collapse
|
28
|
Rassinoux AM, Miller RA, Baud RH, Scherrer JR. Modeling principles for QMR medical findings. Proc AMIA Annu Fall Symp 1996:264-8. [PMID: 8947669 PMCID: PMC2233214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Structured representation of medical information is essential for ensuring the accuracy and reliability of computerized decision support applications. Such systems require input that is error-free and clinically pertinent. This paper reviews existing medical models, particularly those exploited for natural language understanding, and highlights modeling features important to future indexing of medical texts with controlled vocabularies. A hybrid representation derived from existing frame-based and conceptual-graph-based systems is proposed to represent relevant medical terms as used by experts.
Collapse
Affiliation(s)
- A M Rassinoux
- Division of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | | | | | | |
Collapse
|
29
|
Baud RH, Rassinoux AM, Wagner JC, Lovis C, Juge C, Alpay LL, Michel PA, Degoulet P, Scherrer JR. Representing clinical narratives using conceptual graphs. Methods Inf Med 1995; 34:176-86. [PMID: 9082129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The analysis of medical narratives and the generation of natural language expressions are strongly dependent on the existence of an adequate representation language. Such a language has to be expressive enough in order to handle the complexity of human reasoning in the domain. Sowa's Conceptual Graphs (CG) are an answer, and this paper presents a multilingual implementation, using French, English and German. Current developments demonstrate the feasibility of an approach to natural Language Understanding where semantic aspects are dominant, in contrast to syntax driven methods. The basic idea is to aggregate blocks of words according to semantic compatibility rules, following a method called Proximity Processing. The CG representation is gradually built, starting from single words in a semantic lexicon, to finally give a complete representation of the sentence under the form of a single CG. The process is dependent on specific rules of the medical domain, and for this reason is largely controlled by the declarative knowledge of the medical Linguistic Knowledge Base.
Collapse
Affiliation(s)
- R H Baud
- Faculty of Medicine, University of Geneva, Switzerland
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Baud RH, Rassinoux AM, Wagner JC, Lovis C, Juge C, Alpay LL, Michel PA, Degoulet P, Scherrer JR. Representing Clinical Narratives Using Conceptual Graphs. Methods Inf Med 1995. [DOI: 10.1055/s-0038-1634586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Abstract:The analysis of medical narratives and the generation of natural language expressions are strongly dependent on the existence of an adequate representation language. Such a language has to be expressive enough in order to handle the complexity of human reasoning in the domain. Sowa’s Conceptual Graphs (CG) are an answer, and this paper presents a multilingual implementation, using French, English and German. Current developments demonstrate the feasibility of an approach to natural Language Understanding where semantic aspects are dominant, in contrast, to syntax driven methods. The basic idea is to aggregate blocks of words according to semantic compatibility rules, following a method called Proximity Processing. The CG representation is gradually built, starting from single words in a semantic lexicon, to finally give a complete representation of the sentence under the form of a single CG. The process is dependent on specific rules of the medical domain, and for this reason is largely controlled by the declarative knowledge of the medical Linguistic Knowlege Base.
Collapse
|
31
|
Rassinoux AM, Wagner JC, Lovis C, Baud RH, Rector A, Scherrer JR. Analysis of medical texts based on a sound medical model. Proc Annu Symp Comput Appl Med Care 1995:27-31. [PMID: 8563282 PMCID: PMC2579049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Automatic understanding of natural language is a complex task due to the presence of ambiguities. In particular, semantic ambiguities which are often immediately and unconsciously solved by human beings, are raised when analyzing natural language sentences by computer. The latter has to know the implicit and contextual information in order to resolve these difficulties. Nowadays in medicine, a considerable effort is deployed to model semantic contents of the medical domain. Such a task is usually performed separately from linguistic considerations. The goal of this paper is to highlight the key issues of basing a medical language processing system on a sound semantic model. To illustrate the requirements and advantages of such a conceptual approach to the analysis process, the experiment conducted to adjust the RECIT analyzer to the GALEN model is shown.
Collapse
Affiliation(s)
- A M Rassinoux
- Medical Informatics Division, University Hospital of Geneva, Switzerland
| | | | | | | | | | | |
Collapse
|
32
|
Baud RH, Rassinoux AM, Scherrer JR. Natural language processing and semantical representation of medical texts. Methods Inf Med 1992; 31:117-25. [PMID: 1635463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
For medical records, the challenge for the present decade is Natural Language Processing (NLP) of texts, and the construction of an adequate Knowledge Representation. This article describes the components of an NLP system, which is currently being developed in the Geneva Hospital, and within the European Community's AIM programme. They are: a Natural Language Analyser, a Conceptual Graphs Builder, a Data Base Storage component, a Query Processor, a Natural Language Generator and, in addition, a Translator, a Diagnosis Encoding System and a Literature Indexing System. Taking advantage of a closed domain of knowledge, defined around a medical specialty, a method called proximity processing has been developed. In this situation no parser of the initial text is needed, and the system is based on semantical information of near words in sentences. The benefits are: easy implementation, portability between languages, robustness towards badly-formed sentences, and a sound representation using conceptual graphs.
Collapse
Affiliation(s)
- R H Baud
- Centre d'Informatique Hospitalière, University State Hospital of Geneva, Switzerland
| | | | | |
Collapse
|
33
|
Scherrer JR, Baud RH, Hochstrasser D, Ratib O. An integrated hospital information system in Geneva. MD Comput 1990; 7:81-9. [PMID: 2336022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Since the initial design phase from 1971 to 1973, the DIOGENE hospital information system at the University Hospital of Geneva has been treated as a whole and has retained its architectural unity, despite the need for modification and extension over the years. In addition to having a centralized patient database with the mechanisms for data protection and recovery of a transaction-oriented system, the DIOGENE system has a centralized pool of operators who provide support and training to the users; a separate network of remote printers that provides a telex service between the hospital buildings, offices, medical departments, and wards; and a three-component structure that avoids barriers between administrative and medical applications. In 1973, after a 2-year design period, the project was approved and funded. The DIOGENE system has led to more efficient sharing of costly resources, more rapid performance of administrative tasks, and more comprehensive collection of information about the institution and its patients.
Collapse
Affiliation(s)
- J R Scherrer
- Center for Informatics, University Cantonal Hospital, Geneva
| | | | | | | |
Collapse
|