76
|
Hogan WR, Cooper GF, Wallstrom GL, Wagner MM, Depinay JM. The Bayesian aerosol release detector: An algorithm for detecting and characterizing outbreaks caused by an atmospheric release ofBacillus anthracis. Stat Med 2007; 26:5225-52. [DOI: 10.1002/sim.3093] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
77
|
Auble TE, Hsieh M, Gardner W, Cooper GF, Stone RA, McCausland JB, Yealy DM. A prediction rule to identify low-risk patients with heart failure. Acad Emerg Med 2006. [PMID: 15930402 DOI: 10.1111/j.1553-2712.2005.tb00891.x] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
OBJECTIVES To derive a prediction rule using data available in the emergency department (ED) to identify a group of patients hospitalized for the treatment of heart failure who are at low risk of death and serious complications. METHODS The authors analyzed data for all 33,533 patients with a primary hospital discharge diagnosis of heart failure in 1999 who were admitted from EDs in Pennsylvania. Candidate predictors were demographic and medical history variables and the most abnormal examination or diagnostic test values measured in the ED (vital signs only) or on the first day of hospitalization. The authors constructed classification trees to identify a subgroup of patients with an observed rate of death or serious medical complications before discharge < 2%; the tree that identified the subgroup with the lowest rate of this outcome and an inpatient mortality rate < 1% was chosen. RESULTS Within the entire cohort, 4.5% of patients died and 6.8% survived to hospital discharge after experiencing a serious medical complication. The prediction rule used 21 prognostic factors to classify 17.2% of patients as low risk; 19 (0.3%) died and 59 (1.0%) survived to hospital discharge after experiencing a serious medical complication. CONCLUSIONS This clinical prediction rule identified a group of patients hospitalized from the ED for the treatment of heart failure who were at low risk of adverse inpatient outcomes. Model performance needs to be examined in a cohort of patients with an ED diagnosis of heart failure and treated as outpatients or hospitalized.
Collapse
|
78
|
Yoo C, Cooper GF, Schmidt M. A control study to evaluate a computer-based microarray experiment design recommendation system for gene-regulation pathways discovery. J Biomed Inform 2005; 39:126-46. [PMID: 16203178 DOI: 10.1016/j.jbi.2005.05.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2005] [Revised: 04/22/2005] [Accepted: 05/27/2005] [Indexed: 11/22/2022]
Abstract
The main topic of this paper is evaluating a system that uses the expected value of experimentation for discovering causal pathways in gene expression data. By experimentation we mean both interventions (e.g., a gene knock-out experiment) and observations (e.g., passively observing the expression level of a "wild-type" gene). We introduce a system called GEEVE (causal discovery in Gene Expression data using Expected Value of Experimentation), which implements expected value of experimentation in discovering causal pathways using gene expression data. GEEVE provides the following assistance, which is intended to help biologists in their quest to discover gene-regulation pathways: Recommending which experiments to perform (with a focus on "knock-out" experiments) using an expected value of experimentation (EVE) method. Recommending the number of measurements (observational and experimental) to include in the experimental design, again using an EVE method. Providing a Bayesian analysis that combines prior knowledge with the results of recent microarray experimental results to derive posterior probabilities of gene regulation relationships. In recommending which experiments to perform (and how many times to repeat them) the EVE approach considers the biologist's preferences for which genes to focus the discovery process. Also, since exact EVE calculations are exponential in time, GEEVE incorporates approximation methods. GEEVE is able to combine data from knock-out experiments with data from wild-type experiments to suggest additional experiments to perform and then to analyze the results of those microarray experimental results. It models the possibility that unmeasured (latent) variables may be responsible for some of the statistical associations among the expression levels of the genes under study. To evaluate the GEEVE system, we used a gene expression simulator to generate data from specified models of gene regulation. Using the simulator, we evaluated the GEEVE system using a randomized control study that involved 10 biologists, some of whom used GEEVE and some of whom did not. The results show that biologists who used GEEVE reached correct causal assessments about gene regulation more often than did those biologists who did not use GEEVE. The GEEVE users also reached their assessments in a more cost-effective manner.
Collapse
|
79
|
Auble TE, Hsieh M, Gardner W, Cooper GF, Stone RA, McCausland JB, Yealy DM. A prediction rule to identify low-risk patients with heart failure. Acad Emerg Med 2005; 12:514-21. [PMID: 15930402 DOI: 10.1197/j.aem.2004.11.026] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVES To derive a prediction rule using data available in the emergency department (ED) to identify a group of patients hospitalized for the treatment of heart failure who are at low risk of death and serious complications. METHODS The authors analyzed data for all 33,533 patients with a primary hospital discharge diagnosis of heart failure in 1999 who were admitted from EDs in Pennsylvania. Candidate predictors were demographic and medical history variables and the most abnormal examination or diagnostic test values measured in the ED (vital signs only) or on the first day of hospitalization. The authors constructed classification trees to identify a subgroup of patients with an observed rate of death or serious medical complications before discharge < 2%; the tree that identified the subgroup with the lowest rate of this outcome and an inpatient mortality rate < 1% was chosen. RESULTS Within the entire cohort, 4.5% of patients died and 6.8% survived to hospital discharge after experiencing a serious medical complication. The prediction rule used 21 prognostic factors to classify 17.2% of patients as low risk; 19 (0.3%) died and 59 (1.0%) survived to hospital discharge after experiencing a serious medical complication. CONCLUSIONS This clinical prediction rule identified a group of patients hospitalized from the ED for the treatment of heart failure who were at low risk of adverse inpatient outcomes. Model performance needs to be examined in a cohort of patients with an ED diagnosis of heart failure and treated as outpatients or hospitalized.
Collapse
|
80
|
Cooper GF, Abraham V, Aliferis CF, Aronis JM, Buchanan BG, Caruana R, Fine MJ, Janosky JE, Livingston G, Mitchell T, Monti S, Spirtes P. Predicting dire outcomes of patients with community acquired pneumonia. J Biomed Inform 2005; 38:347-66. [PMID: 16198995 DOI: 10.1016/j.jbi.2005.02.005] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2005] [Revised: 02/22/2005] [Accepted: 02/24/2005] [Indexed: 12/11/2022]
Abstract
Community-acquired pneumonia (CAP) is an important clinical condition with regard to patient mortality, patient morbidity, and healthcare resource utilization. The assessment of the likely clinical course of a CAP patient can significantly influence decision making about whether to treat the patient as an inpatient or as an outpatient. That decision can in turn influence resource utilization, as well as patient well being. Predicting dire outcomes, such as mortality or severe clinical complications, is a particularly important component in assessing the clinical course of patients. We used a training set of 1601 CAP patient cases to construct 11 statistical and machine-learning models that predict dire outcomes. We evaluated the resulting models on 686 additional CAP-patient cases. The primary goal was not to compare these learning algorithms as a study end point; rather, it was to develop the best model possible to predict dire outcomes. A special version of an artificial neural network (NN) model predicted dire outcomes the best. Using the 686 test cases, we estimated the expected healthcare quality and cost impact of applying the NN model in practice. The particular, quantitative results of this analysis are based on a number of assumptions that we make explicit; they will require further study and validation. Nonetheless, the general implication of the analysis seems robust, namely, that even small improvements in predictive performance for prevalent and costly diseases, such as CAP, are likely to result in significant improvements in the quality and efficiency of healthcare delivery. Therefore, seeking models with the highest possible level of predictive performance is important. Consequently, seeking ever better machine-learning and statistical modeling methods is of great practical significance.
Collapse
|
81
|
Visweswaran S, Cooper GF. Patient-specific models for predicting the outcomes of patients with community acquired pneumonia. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2005; 2005:759-63. [PMID: 16779142 PMCID: PMC1560580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
We investigated two patient-specific and four population-wide machine learning methods for predicting dire outcomes in community acquired pneumonia (CAP) patients. Predicting dire outcomes in CAP patients can significantly influence the decision about whether to admit the patient to the hospital or to treat the patient at home. Population-wide methods induce models that are trained to perform well on average on all future cases. In contrast, patient-specific methods specifically induce a model for a particular patient case. We trained the models on a set of 1601 patient cases and evaluated them on a separate set of 686 cases. One patient-specific method performed better than the population-wide methods when evaluated within a clinically relevant range of the ROC curve. Our study provides support for patient-specific methods being a promising approach for making clinical predictions.
Collapse
|
82
|
Cooper GF, Visweswaran S. Deriving the expected utility of a predictive model when the utilities are uncertain. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2005; 2005:161-5. [PMID: 16779022 PMCID: PMC1560537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Predictive models are often constructed from clinical databases with the goal of eventually helping make better clinical decisions. Evaluating models using decision theory is therefore natural. When constructing a model using statistical and machine learning methods, however, we are often uncertain about precisely how the model will be used. Thus, decision-independent measures of classification performance, such as the area under an ROC curve, are popular. As a complementary method of evaluation, we investigate techniques for deriving the expected utility of a model under uncertainty about the model's utilities. We demonstrate an example of the application of this approach to the evaluation of two models that diagnose coronary artery disease.
Collapse
|
83
|
Yoo C, Cooper GF. An evaluation of a system that recommends microarray experiments to perform to discover gene-regulation pathways. Artif Intell Med 2004; 31:169-82. [PMID: 15219293 DOI: 10.1016/j.artmed.2004.01.018] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2003] [Revised: 04/14/2003] [Accepted: 01/16/2004] [Indexed: 11/23/2022]
Abstract
The main topic of this paper is modeling the expected value of experimentation (EVE) for discovering causal pathways in gene expression data. By experimentation we mean both interventions (e.g., a gene knockout experiment) and observations (e.g., passively observing the expression level of a "wild-type" gene). We introduce a system called GEEVE (causal discovery in Gene Expression data using Expected Value of Experimentation), which implements expected value of experimentation in discovering causal pathways using gene expression data. GEEVE provides the following assistance, which is intended to help biologists in their quest to discover gene-regulation pathways: Recommending which experiments to perform (with a focus on "knockout" experiments) using an expected value of experimentation method. Recommending the number of measurements (observational and experimental) to include in the experimental design, again using an EVE method. Providing a Bayesian analysis that combines prior knowledge with the results of recent microarray experimental results to derive posterior probabilities of gene regulation relationships. In recommending which experiments to perform (and how many times to repeat them) the EVE approach considers the biologist's preferences for which genes to focus the discovery process. Also, since exact EVE calculations are exponential in time, GEEVE incorporates approximation methods. GEEVE is able to combine data from knockout experiments with data from wild-type experiments to suggest additional experiments to perform and then to analyze the results of those microarray experimental results. It models the possibility that unmeasured (latent) variables may be responsible for some of the statistical associations among the expression levels of the genes under study. To evaluate the GEEVE system, we used a gene expression simulator to generate data from specified models of gene regulation. The results show that the GEEVE system gives better results than two recently published approaches (1) in learning the generating models of gene regulation and (2) in recommending experiments to perform.
Collapse
|
84
|
Middleton B, Hammond WE, Brennan PF, Cooper GF. Accelerating U.S. EHR adoption: how to get there from here. recommendations based on the 2004 ACMI retreat. J Am Med Inform Assoc 2004; 12:13-9. [PMID: 15492028 PMCID: PMC543821 DOI: 10.1197/jamia.m1669] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Despite growing support for the adoption of electronic health records (EHR) to improve U.S. healthcare delivery, EHR adoption in the United States is slow to date due to a fundamental failure of the healthcare information technology marketplace. Reasons for the slow adoption of healthcare information technology include a misalignment of incentives, limited purchasing power among providers, variability in the viability of EHR products and companies, and limited demonstrated value of EHRs in practice. At the 2004 American College of Medical Informatics (ACMI) Retreat, attendees discussed the current state of EHR adoption in this country and identified steps that could be taken to stimulate adoption. In this paper, based upon the ACMI retreat, and building upon the experiences of the authors developing EHR in academic and commercial settings we identify a set of recommendations to stimulate adoption of EHR, including financial incentives, promotion of EHR standards, enabling policy, and educational, marketing, and supporting activities for both the provider community and healthcare consumers.
Collapse
|
85
|
Mani S, Cooper GF. Causal discovery using a Bayesian local causal discovery algorithm. Stud Health Technol Inform 2004; 107:731-5. [PMID: 15360909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
This study focused on the development and application of an efficient algorithm to induce causal relationships from observational data. The algorithm, called BLCD, is based on a causal Bayesian network framework. BLCD initially uses heuristic greedy search to derive the Markov Blanket (MB) of a node that serves as the "locality" for the identification of pair-wise causal relationships. BLCD takes as input a dataset and outputs potential causes of the form variable X causally influences variable Y. Identification of the causal factors of diseases and outcomes, can help formulate better management, prevention and control strategies for the improvement of health care. In this study we focused on investigating factors that may contribute causally to infant mortality in the United States. We used the U.S. Linked Birth/Infant Death dataset for 1991 with more than four million records and about 200 variables for each record. Our sample consisted of 41,155 re-cords randomly selected from the whole dataset. Each record had maternal, paternal and child factors and the outcome at the end of the first year--whether the infant survived or not. Using the infant birth and death dataset as input, BLCD out-put six purported causal relationships. Three out of the six relationships seem plausible. Even though we have not yet discovered a clinically novel causal link, we plan to look for novel causal pathways using the full sample.
Collapse
|
86
|
Chapman WW, Cooper GF, Hanbury P, Chapman BE, Harrison LH, Wagner MM. Creating a text classifier to detect radiology reports describing mediastinal findings associated with inhalational anthrax and other disorders. J Am Med Inform Assoc 2003; 10:494-503. [PMID: 12807805 PMCID: PMC212787 DOI: 10.1197/jamia.m1330] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2003] [Accepted: 05/13/2003] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE The aim of this study was to create a classifier for automatic detection of chest radiograph reports consistent with the mediastinal findings of inhalational anthrax. DESIGN The authors used the Identify Patient Sets (IPS) system to create a key word classifier for detecting reports describing mediastinal findings consistent with anthrax and compared their performances on a test set of 79,032 chest radiograph reports. MEASUREMENTS Area under the ROC curve was the main outcome measure of the IPS classifier. Sensitivity and specificity of an initial IPS model were calculated based on an existing key word search and were compared against a Boolean version of the IPS classifier. RESULTS The IPS classifier received an area under the ROC curve of 0.677 (90% CI = 0.628 to 0.772) with a specificity of 0.99 and maximum sensitivity of 0.35. The initial IPS model attained a specificity of 1.0 and a sensitivity of 0.04. CONCLUSION The IPS system is a useful tool for helping domain experts create a statistical key word classifier for textual reports that is a potentially useful component in surveillance of radiographic findings suspicious for anthrax.
Collapse
|
87
|
Yoo C, Cooper GF. A computer-based microarray experiment design-system for gene-regulation pathway discovery. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2003; 2003:733-7. [PMID: 14728270 PMCID: PMC1480329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
This paper reports the methods and evaluation of a computer-based system that recommends microarray experimental design for biologists - causal discovery in Gene Expression data using Expected Value of Experimentation (GEEVE). The GEEVE system uses causal Bayesian networks and generates a decision tree for recommendations. To evaluate the GEEVE system, we first built an expression simulation model based on a gene regulation model assessed by an expert biologist. Using the simulation model, we conducted a controlled study that involved 10 biologists, some of whom used GEEVE and some of whom did not. The results show that biologists who used GEEVE reached correct causal assessments about gene regulation more often than did those biologists who did not use GEEVE.
Collapse
|
88
|
Visweswaran S, Hanbury P, Saul M, Cooper GF. Detecting adverse drug events in discharge summaries using variations on the simple Bayes model. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2003; 2003:689-93. [PMID: 14728261 PMCID: PMC1479984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
Detection and prevention of adverse events and, in particular, adverse drug events (ADEs), is an important problem in health care today. We describe the implementation and evaluation of four variations on the simple Bayes model for identifying ADE-related discharge summaries. Our results show that these probabilistic techniques achieve an ROC curve area of up to 0.77 in correctly determining which patient cases should be assigned an ADE-related ICD-9-CM code. These results suggest a potential for these techniques to contribute to the development of an automated system that helps identify ADEs, as a step toward further understanding and preventing them.
Collapse
|
89
|
Yoo C, Thorsson V, Cooper GF. Discovery of causal relationships in a gene-regulation pathway from a mixture of experimental and observational DNA microarray data. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2002:498-509. [PMID: 11928502 DOI: 10.1142/9789812799623_0046] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
This paper reports the methods and results of a computer-based search for causal relationships in the gene-regulation pathway of galactose metabolism in the yeast Saccharomyces cerevisiae. The search uses recently published data from cDNA microarray experiments. A Bayesian method was applied to learn causal networks from a mixture of observational and experimental gene-expression data. The observational data were gene-expression levels obtained from unmanipulated "wild-type" cells. The experimental data were produced by deleting ("knocking out") genes and observing the expression levels of other genes. Causal relations predicted from the analysis on 36 galactose gene pairs are reported and compared with the known galactose pathway. Additional exploratory analyses are also reported.
Collapse
|
90
|
Yoo C, Cooper GF. Discovery of gene-regulation pathways using local causal search. Proc AMIA Symp 2002:914-8. [PMID: 12463958 PMCID: PMC2244381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023] Open
Abstract
This paper reports the methods and results of a computer-based algorithm that takes as input the expression levels of a set of genes as given by DNA microarray data, and then searches for causal pathways that represent how the genes regulate each other. The algorithm uses local heuristic search and a Bayesian scoring metric. We applied the algorithm to induce causal networks from a mixture of observational and experimental gene-expression data on genes involved in galactose metabolism in the yeast Saccharomyces cerevisiae. The observational data consisted of gene-expression levels obtained from unmanipulated inverted exclamation mark degrees wild-type inverted exclamation mark +/- cells. The experimental data were produced by deleting ( inverted exclamation mark degrees knocking out inverted exclamation mark +/-) genes and measuring the expression levels of other genes. We used this data to evaluate several variations of the local search method. In each evaluation, causal relationships were predicted for all 36 pairwise combinations of nine key galactose-related genes. These predictions were then compared to the known causal relationships among these genes.
Collapse
|
91
|
Cooper GF. Knowledge Processing and Decision Support Systems. Yearb Med Inform 2002:477-479. [PMID: 27706352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023] Open
|
92
|
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 2001; 34:301-10. [PMID: 12123149 DOI: 10.1006/jbin.2001.1029] [Citation(s) in RCA: 450] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
Collapse
|
93
|
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Symp 2001:105-9. [PMID: 11825163 PMCID: PMC2243578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
OBJECTIVE Automatically identifying findings or diseases described in clinical textual reports requires determining whether clinical observations are present or absent. We evaluate the use of negation phrases and the frequency of negation in free-text clinical reports. METHODS A simple negation algorithm was applied to ten types of clinical reports (n=42,160) dictated during July 2000. We counted how often each of 66 negation phrases was used to mark a clinical observation as absent. Physicians read a random sample of 400 sentences, and precision was calculated for the negation phrases. We measured what proportion of clinical observations were marked as absent. RESULTS The negation algorithm was triggered by sixty negation phrases with just seven of the phrases accounting for 90% of the negations. The negation phrases received an overall precision of 97%, with "not" earning the lowest precision of 63%. Between 39% and 83% of all clinical observations were identified as absent by the negation algorithm, depending on the type of report analyzed. The most frequently used clinical observations were negated the majority of the time. CONCLUSION Because clinical observations in textual patient records are frequently negated, identifying accurate negation phrases is important to any system processing these reports.
Collapse
|
94
|
Mani S, Cooper GF. Causal discovery from medical textual data. Proc AMIA Symp 2000:542-6. [PMID: 11079942 PMCID: PMC2243738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023] Open
Abstract
Medical records usually incorporate investigative reports, historical notes, patient encounters or discharge summaries as textual data. This study focused on learning causal relationships from intensive care unit (ICU) discharge summaries of 1611 patients. Identification of the causal factors of clinical conditions and outcomes can help us formulate better management, prevention and control strategies for the improvement of health care. For causal discovery we applied the Local Causal Discovery (LCD) algorithm, which uses the framework of causal Bayesian Networks to represent causal relationships among model variables. LCD takes as input a dataset and outputs causes of the form variable Y causally influences variable Z. Using the words that occur in the discharge summaries as attributes for input, LCD output 8 purported causal relationships. The relationships ranked as most probable subjectively appear to be most causally plausible.
Collapse
|
95
|
Kayaalp M, Cooper GF, Clermont G. Predicting ICU mortality: a comparison of stationary and nonstationary temporal models. Proc AMIA Symp 2000:418-22. [PMID: 11079917 PMCID: PMC2243937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023] Open
Abstract
OBJECTIVE This study evaluates the effectiveness of the stationarity assumption in predicting the mortality of intensive care unit (ICU) patients at the ICU discharge. DESIGN This is a comparative study. A stationary temporal Bayesian network learned from data was compared to a set of (33) nonstationary temporal Bayesian networks learned from data. A process observed as a sequence of events is stationary if its stochastic properties stay the same when the sequence is shifted in a positive or negative direction by a constant time parameter. The temporal Bayesian networks forecast mortalities of patients, where each patient has one record per day. The predictive performance of the stationary model is compared with nonstationary models using the area under the receiver operating characteristics (ROC) curves. RESULTS The stationary model usually performed best. However, one nonstationary model using large data sets performed significantly better than the stationary model. CONCLUSION Results suggest that using a combination of stationary and nonstationary models may predict better than using either alone.
Collapse
|
96
|
Mani S, Cooper GF. A study in causal discovery from population-based infant birth and death records. Proc AMIA Symp 1999:315-9. [PMID: 10566372 PMCID: PMC2232606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Open
Abstract
In the domain of medicine, identification of the causal factors of diseases and outcomes, helps us formulate better management, prevention and control strategies for the improvement of health care. With the goal of exploring, evaluating and refining techniques to learn causal relationships from observational data, such as data routinely collected in healthcare settings, we focused on investigating factors that may contribute causally to infant mortality in the United States. We used the U.S. Linked Birth/Infant Death dataset for 1991 with more than four million records and about 200 variables for each record. Our sample consisted of 41,155 records randomly selected from the whole dataset. Each record had maternal, paternal and child factors and the outcome at the end of the first year--whether the infant survived or not. For causal discovery we used a modified Local Causal Discovery (LCD2) algorithm, which uses the framework of causal Bayesian Networks to represent causal relationships among model variables. LCD2 takes as input a dataset and outputs causes of the form variable X causes variable Y. Using the infant birth and death dataset as input, LCD2 output nine purported causal relationships. Eight out of the nine relationships seem plausible. Even though we have not yet discovered a clinically novel causal link, we plan to look for novel causal pathways using the full sample after refining the algorithm and developing a more efficient implementation.
Collapse
|
97
|
Aronis JM, Cooper GF, Kayaalp M, Buchanan BG. Identifying patient subgroups with simple Bayes'. Proc AMIA Symp 1999:658-62. [PMID: 10566441 PMCID: PMC2232601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Open
Abstract
Medical records can form the basis of retrospective studies, be used to evaluate hospital practices and guidelines, and provide examples for teaching medicine. Each of these tasks presumes the ability to accurately identify patient subgroups. We describe a method for selecting patient subgroups based on the text of their medical records and demonstrate its effectiveness. We also describe a modification of the basic system that does not assume the existence of a preclassified training set, and illustrate its effectiveness in one retrieval task.
Collapse
|
98
|
Aliferis CF, Cooper GF. Temporal representation design principles: an assessment in the domain of liver transplantation. Proc AMIA Symp 1998:170-4. [PMID: 9929204 PMCID: PMC2232207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023] Open
Abstract
Time modeling is an important aspect of medical decision-support systems engineering. At the core of effective time modeling lies the challenge of proper knowledge representation design. In this paper, we focus on two important principles for effective time-modeling languages: (a) hybrid temporal representation, and (b) dynamic temporal abstraction. To explore the significance of these design principles, we extend a previously-defined formalism (single-granularity modifiable temporal belief networks--MTBN-SGs) to accommodate multiple temporal granularities and dynamic query and domain-specific model creation. We call the new formalism multiple-granularity MTBNs (MTBN-MGs). We develop a prototype system for modeling aspects of liver transplantation and analyze the resulting model with respect to its representation power, representational tractability, and inferential tractability. Our experiment demonstrates that the design of formalisms is crucial for effective time modeling. In particular: (i) Hybrid temporal representation is a desirable property of time-modeling languages because it makes knowledge acquisition easier, and increases representational tractability. (ii) Dynamic temporal abstraction improves inferential and representational tractability significantly. We discuss a high-level procedure for extending existing languages to incorporate hybrid temporal representation and dynamic temporal abstraction.
Collapse
|
99
|
Monti S, Cooper GF. The impact of modeling the dependencies among patient findings on classification accuracy and calibration. Proc AMIA Symp 1998:592-6. [PMID: 9929288 PMCID: PMC2232324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023] Open
Abstract
We present a new Bayesian classifier for computer-aided diagnosis. The new classifier builds upon the naive-Bayes classifier, and models the dependencies among patient findings in an attempt to improve its performance, both in terms of classification accuracy and in terms of calibration of the estimated probabilities. This work finds motivation in the argument that highly calibrated probabilities are necessary for the clinician to be able to rely on the model's recommendations. Experimental results are presented, supporting the conclusion that modeling the dependencies among findings improves calibration.
Collapse
|
100
|
Cooper GF, Miller RA. An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text. J Am Med Inform Assoc 1998; 5:62-75. [PMID: 9452986 PMCID: PMC61276 DOI: 10.1136/jamia.1998.0050062] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/1997] [Accepted: 09/17/1997] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE A primary goal of the University of Pittsburgh's 1990-94 UMLS-sponsored effort was to develop and evaluate PostDoc (a lexical indexing system) and Pindex (a statistical indexing system) comparatively, and then in combination as a hybrid system. Each system takes as input a portion of the free text from a narrative part of a patient's electronic medical record and returns a list of suggested MeSH terms to use in formulating a Medline search that includes concepts in the text. This paper describes the systems and reports an evaluation. The intent is for this evaluation to serve as a step toward the eventual realization of systems that assist healthcare personnel in using the electronic medical record to construct patient-specific searches of Medline. DESIGN The authors tested the performances of PostDoc, Pindex, and a hybrid system, using text taken from randomly selected clinical records, which were stratified to include six radiology reports, six pathology reports, and six discharge summaries. They identified concepts in the clinical records that might conceivably be used in performing a patient-specific Medline search. Each system was given the free text of each record as an input. The extent to which a system-derived list of MeSH terms captured the relevant concepts in these documents was determined based on blinded assessments by the authors. RESULTS PostDoc output a mean of approximately 19 MeSH terms per report, which included about 40% of the relevant report concepts. Pindex output a mean of approximately 57 terms per report and captured about 45% of the relevant report concepts. A hybrid system captured approximately 66% of the relevant concepts and output about 71 terms per report. CONCLUSION The outputs of PostDoc and Pindex are complementary in capturing MeSH terms from clinical free text. The results suggest possible approaches to reduce the number of terms output while maintaining the percentage of terms captured, including the use of UMLS semantic types to constrain the output list to contain only clinically relevant MeSH terms.
Collapse
|