1
|
Aronis JM, Ye Y, Espino J, Hochheiser H, Michaels MG, Cooper GF. A Bayesian System to Detect and Track Outbreaks of Influenza-Like Illnesses Including Novel Diseases. JMIR Public Health Surveill 2024. [PMID: 38805611 DOI: 10.2196/57349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024] Open
Abstract
BACKGROUND The early identification of outbreaks of both known and novel influenza-like illnesses is an important public health problem. OBJECTIVE The design and testing of a tool that detects and tracks outbreaks of both known and novel influenza-like illness, such as the SARS-CoV-19 worldwide pandemic, accurately and early. METHODS This paper describes the ILI Tracker algorithm that first models the daily occurrence of a set of known influenza-like illnesses in hospital emergency departments in a monitored region using findings extracted from patient care reports using natural language processing. We then show how the algorithm can be extended to detect and track the presence of an unmodeled disease which may represent a novel disease outbreak. RESULTS We include results based on modeling the diseases influenza, respiratory syncytial virus, human metapneumovirus, and parainfluenza for five emergency departments in Allegheny County Pennsylvania from June 1, 2014 through May 31, 2015. We also include the results of detecting the outbreak of an unmodeled disease, which in retrospect was very likely an outbreak of the enterovirus EV-D68. CONCLUSIONS The results reported in this paper provide support that ILI Tracker was able to track well the incidence of four modeled influenza-like diseases over a one-year period, relative to laboratory confirmed cases, and it was computationally efficient in doing so. The system was alsoable to detect a likely novel outbreak of the enterovirus D68 early in an outbreak that occurred in Allegheny County in 2014, as well as clinically characterize that outbreak disease accurately. CLINICALTRIAL
Collapse
|
2
|
Rahman MA, Cai C, Bo N, McNamara DM, Ding Y, Cooper GF, Lu X, Liu J. An individualized Bayesian method for estimating genomic variants of hypertension. BMC Genomics 2023; 23:863. [PMID: 37936055 PMCID: PMC10631115 DOI: 10.1186/s12864-023-09757-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 10/19/2023] [Indexed: 11/09/2023] Open
Abstract
BACKGROUND Genomic variants of the disease are often discovered nowadays through population-based genome-wide association studies (GWAS). Identifying genomic variations potentially underlying a phenotype, such as hypertension, in an individual is important for designing personalized treatment; however, population-level models, such as GWAS, may not capture all the important, individualized factors well. In addition, GWAS typically requires a large sample size to detect the association of low-frequency genomic variants with sufficient power. Here, we report an individualized Bayesian inference (IBI) algorithm for estimating the genomic variants that influence complex traits, such as hypertension, at the level of an individual (e.g., a patient). By modeling at the level of the individual, IBI seeks to find genomic variants observed in the individual's genome that provide a strong explanation of the phenotype observed in this individual. RESULTS We applied the IBI algorithm to the data from the Framingham Heart Study to explore the genomic influences of hypertension. Among the top-ranking variants identified by IBI and GWAS, there is a significant number of shared variants (intersection); the unique variants identified only by IBI tend to have relatively lower minor allele frequency than those identified by GWAS. In addition, IBI discovered more individualized and diverse variants that explain hypertension patients better than GWAS. Furthermore, IBI found several well-known low-frequency variants as well as genes related to blood pressure that GWAS missed in the same cohort. Finally, IBI identified top-ranked variants that predicted hypertension better than GWAS, according to the area under the ROC curve. CONCLUSIONS The results support IBI as a promising approach for complementing GWAS, especially in detecting low-frequency genomic variants as well as learning personalized genomic variants of clinical traits and disease, such as the complex trait of hypertension, to help advance precision medicine.
Collapse
|
3
|
King AJ, Angus DC, Cooper GF, Mowery DL, Seaman JB, Potter KM, Bukowski LA, Al-Khafaji A, Gunn SR, Kahn JM. A voice-based digital assistant for intelligent prompting of evidence-based practices during ICU rounds. J Biomed Inform 2023; 146:104483. [PMID: 37657712 PMCID: PMC10591951 DOI: 10.1016/j.jbi.2023.104483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/21/2023] [Accepted: 08/29/2023] [Indexed: 09/03/2023]
Abstract
OBJECTIVE To evaluate the technical feasibility and potential value of a digital assistant that prompts intensive care unit (ICU) rounding teams to use evidence-based practices based on analysis of their real-time discussions. METHODS We evaluated a novel voice-based digital assistant which audio records and processes the ICU care team's rounding discussions to determine which evidence-based practices are applicable to the patient but have yet to be addressed by the team. The system would then prompt the team to consider indicated but not yet delivered practices, thereby reducing cognitive burden compared to traditional rigid rounding checklists. In a retrospective analysis, we applied automatic transcription, natural language processing, and a rule-based expert system to generate personalized prompts for each patient in 106 audio-recorded ICU rounding discussions. To assess technical feasibility, we compared the system's prompts to those created by experienced critical care nurses who directly observed rounds. To assess potential value, we also compared the system's prompts to a hypothetical paper checklist containing all evidence-based practices. RESULTS The positive predictive value, negative predictive value, true positive rate, and true negative rate of the system's prompts were 0.45 ± 0.06, 0.83 ± 0.04, 0.68 ± 0.07, and 0.66 ± 0.04, respectively. If implemented in lieu of a paper checklist, the system would generate 56% fewer prompts per patient, with 50%±17% greater precision. CONCLUSION A voice-based digital assistant can reduce prompts per patient compared to traditional approaches for improving evidence uptake on ICU rounds. Additional work is needed to evaluate field performance and team acceptance.
Collapse
|
4
|
Ren S, Cooper GF, Chen L, Lu X. An interpretable deep learning framework for genome-informed precision oncology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.11.548534. [PMID: 37503199 PMCID: PMC10369905 DOI: 10.1101/2023.07.11.548534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Cancers result from aberrations in cellular signaling systems, typically resulting from driver somatic genome alterations (SGAs) in individual tumors. Precision oncology requires understanding the cellular state and selecting medications that induce vulnerability in cancer cells under such conditions. To this end, we developed a computational framework consisting of two components: 1) A representation-learning component, which learns a representation of the cellular signaling systems when perturbed by SGAs, using a biologically-motivated and interpretable deep learning model. 2) A drug-response-prediction component, which predicts the response to drugs by leveraging the information of the cellular state of the cancer cells derived by the first component. Our cell-state-oriented framework significantly enhances the accuracy of genome-informed prediction of drug responses in comparison to models that directly use SGAs as inputs. Importantly, our framework enables the prediction of response to chemotherapy agents based on SGAs, thus expanding genome-informed precision oncology beyond molecularly targeted drugs.
Collapse
|
5
|
Aronis JM, Ye Y, Espino J, Hochheiser H, Michaels MG, Cooper GF. A Bayesian System to Track Outbreaks of Influenza-Like Illnesses Including Novel Diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.10.23289799. [PMID: 37293033 PMCID: PMC10246032 DOI: 10.1101/2023.05.10.23289799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
It would be highly desirable to have a tool that detects the outbreak of a new influenza-like illness, such as COVID-19, accurately and early. This paper describes the ILI Tracker algorithm that first models the daily occurrence of a set of known influenza-like illnesses in a hospital emergency department using findings extracted from patient-care reports using natural language processing. We include results based on modeling the diseases influenza, respiratory syncytial virus, human metapneumovirus, and parainfluenza for five emergency departments in Allegheny County Pennsylvania from June 1, 2010 through May 31, 2015. We then show how the algorithm can be extended to detect the presence of an unmodeled disease which may represent a novel disease outbreak. We also include results for detecting an outbreak of an unmodeled disease during the mentioned time period, which in retrospect was very likely an outbreak of Enterovirus D68.
Collapse
|
6
|
Andrews B, Wongchokprasitti C, Visweswaran S, Lakhani CM, Patel CJ, Cooper GF. A new method for estimating the probability of causal relationships from observational data: Application to the study of the short-term effects of air pollution on cardiovascular and respiratory disease. Artif Intell Med 2023; 139:102546. [PMID: 37100513 PMCID: PMC10171833 DOI: 10.1016/j.artmed.2023.102546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 04/04/2023] [Accepted: 04/04/2023] [Indexed: 04/28/2023]
Abstract
In this paper we investigate which airborne pollutants have a short-term causal effect on cardiovascular and respiratory disease using the Ancestral Probabilities (AP) procedure, a novel Bayesian approach for deriving the probabilities of causal relationships from observational data. The results are largely consistent with EPA assessments of causality, however, in a few cases AP suggests that some pollutants thought to cause cardiovascular or respiratory disease are associated due purely to confounding. The AP procedure utilizes maximal ancestral graph (MAG) models to represent and assign probabilities to causal relationships while accounting for latent confounding. The algorithm does so locally by marginalizing over models with and without causal features of interest. Before applying AP to real data, we evaluate it in a simulation study and investigate the benefits of providing background knowledge. Overall, the results suggest that AP is an effective tool for causal discovery.
Collapse
|
7
|
King AJ, Potter KM, Seaman JB, Chiyka EA, Hileman BA, Cooper GF, Mowery DL, Angus DC, Kahn JM. Measuring Performance on the ABCDEF Bundle During Interprofessional Rounds via a Nurse-Based Assessment Tool. Am J Crit Care 2023; 32:92-99. [PMID: 36854912 DOI: 10.4037/ajcc2023755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
BACKGROUND Nurse-led rounding checklists are a common strategy for facilitating evidence-based practice in the intensive care unit (ICU). To streamline checklist workflow, some ICUs have the nurse or another individual listen to the conversation and customize the checklist for each patient. Such customizations assume that individuals can reliably assess whether checklist items have been addressed. OBJECTIVE To evaluate whether 1 critical care nurse can reliably assess checklist items on rounds. METHODS Two nurses performed in-person observation of multidisciplinary ICU rounds. Using a standardized paper-based assessment tool, each nurse indicated whether 17 items related to the ABCDEF bundle were discussed during rounds. For each item, generalizability coefficients were used as a measure of reliability, with a single-rater value of 0.70 or greater considered sufficient to support its assessment by 1 nurse. RESULTS The nurse observers assessed 118 patient discussions across 15 observation days. For 11 of 17 items (65%), the generalizability coefficient for a single rater met or exceeded the 0.70 threshold. The generalizability coefficients (95% CIs) of a single rater for key items were as follows: pain, 0.86 (0.74-0.97); delirium score, 0.74 (0.64-0.83); agitation score, 0.72 (0.33-1.00); spontaneous awakening trial, 0.67 (0.49-0.83); spontaneous breathing trial, 0.80 (0.70-0.89); mobility, 0.79 (0.69-0.87); and family (future/past) engagement, 0.82 (0.73-0.90). CONCLUSION Using a paper-based assessment tool, a single trained critical care nurse can reliably assess the discussion of elements of the ABCDEF bundle during multidisciplinary rounds.
Collapse
|
8
|
Liu Z, Cai C, Ma X, Liu J, Chen L, Lui VWY, Cooper GF, Lu X. A Novel Bayesian Framework Infers Driver Activation States and Reveals Pathway-Oriented Molecular Subtypes in Head and Neck Cancer. Cancers (Basel) 2022; 14:cancers14194825. [PMID: 36230748 PMCID: PMC9563147 DOI: 10.3390/cancers14194825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 09/28/2022] [Accepted: 09/30/2022] [Indexed: 02/08/2023] Open
Abstract
Head and neck squamous cell cancer (HNSCC) is an aggressive cancer resulting from heterogeneous causes. To reveal the underlying drivers and signaling mechanisms of different HNSCC tumors, we developed a novel Bayesian framework to identify drivers of individual tumors and infer the states of driver proteins in cellular signaling system in HNSCC tumors. First, we systematically identify causal relationships between somatic genome alterations (SGAs) and differentially expressed genes (DEGs) for each TCGA HNSCC tumor using the tumor-specific causal inference (TCI) model. Then, we generalize the most statistically significant driver SGAs and their regulated DEGs in TCGA HNSCC cohort. Finally, we develop machine learning models that combine genomic and transcriptomic data to infer the protein functional activation states of driver SGAs in tumors, which enable us to represent a tumor in the space of cellular signaling systems. We discovered four mechanism-oriented subtypes of HNSCC, which show distinguished patterns of activation state of HNSCC driver proteins, and importantly, this subtyping is orthogonal to previously reported transcriptomic-based molecular subtyping of HNSCC. Further, our analysis revealed driver proteins that are likely involved in oncogenic processes induced by HPV infection, even though they are not perturbed by genomic alterations in HPV+ tumors.
Collapse
|
9
|
Johnson A, Cooper GF, Visweswaran S. A Novel Personalized Random Forest Algorithm for Clinical Outcome Prediction. Stud Health Technol Inform 2022; 290:248-252. [PMID: 35673011 DOI: 10.3233/shti220072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning algorithms that derive predictive models are useful in predicting patient outcomes under uncertainty. These are often "population" algorithms which optimize a static model to predict well on average for individuals in the population; however, population models may predict poorly for individuals that differ from the average. Personalized machine learning algorithms seek to optimize predictive performance for every patient by tailoring a patient-specific model to each individual. Ensembles of decision trees often outperform single decision tree models, but ensembles of personalized models like decision paths have received little investigation. We present a novel personalized ensemble, called Lazy Random Forest (LazyRF), which consists of bagged randomized decision paths optimized for the individual for whom a prediction will be made. LazyRF outperformed single and bagged decision paths and demonstrated comparable predictive performance to a population random forest method in terms of discrimination on clinical and genomic data while also producing simpler models than the population random forest.
Collapse
|
10
|
Visweswaran S, King AJ, Tajgardoon M, Calzoni L, Clermont G, Hochheiser H, Cooper GF. Evaluation of eye tracking for a decision support application. JAMIA Open 2021; 4:ooab059. [PMID: 34350394 PMCID: PMC8327376 DOI: 10.1093/jamiaopen/ooab059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Revised: 05/08/2021] [Accepted: 07/01/2021] [Indexed: 11/12/2022] Open
Abstract
Eye tracking is used widely to investigate attention and cognitive processes while performing tasks in electronic medical record (EMR) systems. We explored a novel application of eye tracking to collect training data for a machine learning-based clinical decision support tool that predicts which patient data are likely to be relevant for a clinical task. Specifically, we investigated in a laboratory setting the accuracy of eye tracking compared to manual annotation for inferring which patient data in the EMR are judged to be relevant by physicians. We evaluated several methods for processing gaze points that were recorded using a low-cost eye-tracking device. Our results show that eye tracking achieves accuracy and precision of 69% and 53%, respectively compared to manual annotation and are promising for machine learning. The methods for processing gaze points and scripts that we developed offer a first step in developing novel uses for eye tracking for clinical decision support.
Collapse
|
11
|
King AJ, Calzoni L, Tajgardoon M, Cooper GF, Clermont G, Hochheiser H, Visweswaran S. A simple electronic medical record system designed for research. JAMIA Open 2021; 4:ooab040. [PMID: 34345801 PMCID: PMC8325484 DOI: 10.1093/jamiaopen/ooab040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 03/23/2021] [Accepted: 05/05/2021] [Indexed: 11/14/2022] Open
Abstract
With the extensive deployment of electronic medical record (EMR) systems, EMR usability remains a significant source of frustration to clinicians. There is a significant research need for software that emulates EMR systems and enables investigators to conduct laboratory-based human–computer interaction studies. We developed an open-source software package that implements the display functions of an EMR system. The user interface emphasizes the temporal display of vital signs, medication administrations, and laboratory test results. It is well suited to support research about clinician information-seeking behaviors and adaptive user interfaces in terms of measures that include task accuracy, time to completion, and cognitive load. The Simple EMR System is freely available to the research community and is on GitHub.
Collapse
|
12
|
Taneja SB, Douglas GP, Cooper GF, Michaels MG, Druzdzel MJ, Visweswaran S. Bayesian network models with decision tree analysis for management of childhood malaria in Malawi. BMC Med Inform Decis Mak 2021; 21:158. [PMID: 34001100 PMCID: PMC8130361 DOI: 10.1186/s12911-021-01514-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 05/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Malaria is a major cause of death in children under five years old in low- and middle-income countries such as Malawi. Accurate diagnosis and management of malaria can help reduce the global burden of childhood morbidity and mortality. Trained healthcare workers in rural health centers manage malaria with limited supplies of malarial diagnostic tests and drugs for treatment. A clinical decision support system that integrates predictive models to provide an accurate prediction of malaria based on clinical features could aid healthcare workers in the judicious use of testing and treatment. We developed Bayesian network (BN) models to predict the probability of malaria from clinical features and an illustrative decision tree to model the decision to use or not use a malaria rapid diagnostic test (mRDT). METHODS We developed two BN models to predict malaria from a dataset of outpatient encounters of children in Malawi. The first BN model was created manually with expert knowledge, and the second model was derived using an automated method. The performance of the BN models was compared to other statistical models on a range of performance metrics at multiple thresholds. We developed a decision tree that integrates predictions with the costs of mRDT and a course of recommended treatment. RESULTS The manually created BN model achieved an area under the ROC curve (AUC) equal to 0.60 which was statistically significantly higher than the other models. At the optimal threshold for classification, the manual BN model had sensitivity and specificity of 0.74 and 0.42 respectively, and the automated BN model had sensitivity and specificity of 0.45 and 0.68 respectively. The balanced accuracy values were similar across all the models. Sensitivity analysis of the decision tree showed that for values of probability of malaria below 0.04 and above 0.40, the preferred decision that minimizes expected costs is not to perform mRDT. CONCLUSION In resource-constrained settings, judicious use of mRDT is important. Predictive models in combination with decision analysis can provide personalized guidance on when to use mRDT in the management of childhood malaria. BN models can be efficiently derived from data to support clinical decision making.
Collapse
|
13
|
Johnson A, Cooper GF, Visweswaran S. Patient-Specific Modeling with Personalized Decision Paths. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:602-611. [PMID: 33936434 PMCID: PMC8075540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Predictive models can be useful in predicting patient outcomes under uncertainty. Many algorithms employ "population" methods, which optimize a single model to perform well on average over an entire population, but the model may perform poorly on some patients. Personalized methods optimize predictive performance for each patient by tailoring the model to the individual. We present a new personalized method based on decision trees: the Personalized Decision Path using a Bayesian score (PDP-Bay). Performance on eight synthetic, genomic, and clinical datasets was compared to that of decision trees and a previously described personalized decision path method in terms of area under the ROC curve (AUC) and expected calibration error (ECE). Model complexity was measured by average path length. The PDP-Bay model outperformed the decision tree in terms of both AUC and ECE. The results support the conclusion that personalization may achieve better predictive performance and produce simpler models than population approaches.
Collapse
|
14
|
Tajgardoon M, Cooper GF, King AJ, Clermont G, Hochheiser H, Hauskrecht M, Sittig DF, Visweswaran S. Modeling physician variability to prioritize relevant medical record information. JAMIA Open 2020; 3:602-610. [PMID: 33623894 PMCID: PMC7886572 DOI: 10.1093/jamiaopen/ooaa058] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/05/2020] [Accepted: 11/02/2020] [Indexed: 02/05/2023] Open
Abstract
Objective Patient information can be retrieved more efficiently in electronic medical record (EMR) systems by using machine learning models that predict which information a physician will seek in a clinical context. However, information-seeking behavior varies across EMR users. To explicitly account for this variability, we derived hierarchical models and compared their performance to nonhierarchical models in identifying relevant patient information in intensive care unit (ICU) cases. Materials and methods Critical care physicians reviewed ICU patient cases and selected data items relevant for presenting at morning rounds. Using patient EMR data as predictors, we derived hierarchical logistic regression (HLR) and standard logistic regression (LR) models to predict their relevance. Results In 73 pairs of HLR and LR models, the HLR models achieved an area under the receiver operating characteristic curve of 0.81, 95% confidence interval (CI) [0.80-0.82], which was statistically significantly higher than that of LR models (0.75, 95% CI [0.74-0.76]). Further, the HLR models achieved statistically significantly lower expected calibration error (0.07, 95% CI [0.06-0.08]) than LR models (0.16, 95% CI [0.14-0.17]). Discussion The physician reviewers demonstrated variability in selecting relevant data. Our results show that HLR models perform significantly better than LR models with respect to both discrimination and calibration. This is likely due to explicitly modeling physician-related variability. Conclusion Hierarchical models can yield better performance when there is physician-related variability as in the case of identifying relevant information in the EMR.
Collapse
|
15
|
Liu J, Ma X, Cooper GF, Lu X. Explicit representation of protein activity states significantly improves causal discovery of protein phosphorylation networks. BMC Bioinformatics 2020; 21:379. [PMID: 32938361 DOI: 10.1186/s12859-020-03676-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein phosphorylation networks play an important role in cell signaling. In these networks, phosphorylation of a protein kinase usually leads to its activation, which in turn will phosphorylate its downstream target proteins. A phosphorylation network is essentially a causal network, which can be learned by causal inference algorithms. Prior efforts have applied such algorithms to data measuring protein phosphorylation levels, assuming that the phosphorylation levels represent protein activity states. However, the phosphorylation status of a kinase does not always reflect its activity state, because interventions such as inhibitors or mutations can directly affect its activity state without changing its phosphorylation status. Thus, when cellular systems are subjected to extensive perturbations, the statistical relationships between phosphorylation states of proteins may be disrupted, making it difficult to reconstruct the true protein phosphorylation network. Here, we describe a novel framework to address this challenge. RESULTS We have developed a causal discovery framework that explicitly represents the activity state of each protein kinase as an unmeasured variable and developed a novel algorithm called "InferA" to infer the protein activity states, which allows us to incorporate the protein phosphorylation level, pharmacological interventions and prior knowledge. We applied our framework to simulated datasets and to a real-world dataset. The simulation experiments demonstrated that explicit representation of activity states of protein kinases allows one to effectively represent the impact of interventions and thus enabled our framework to accurately recover the ground-truth causal network. Results from the real-world dataset showed that the explicit representation of protein activity states allowed an effective and data-driven integration of the prior knowledge by InferA, which further leads to the recovery of a phosphorylation network that is more consistent with experiment results. CONCLUSIONS Explicit representation of the protein activity states by our novel framework significantly enhances causal discovery of protein phosphorylation networks.
Collapse
|
16
|
Calzoni L, Clermont G, Cooper GF, Visweswaran S, Hochheiser H. Graphical Presentations of Clinical Data in a Learning Electronic Medical Record. Appl Clin Inform 2020; 11:680-691. [PMID: 33058103 PMCID: PMC7560537 DOI: 10.1055/s-0040-1709707] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Complex electronic medical records (EMRs) presenting large amounts of data create risks of cognitive overload. We are designing a Learning EMR (LEMR) system that utilizes models of intensive care unit (ICU) physicians' data access patterns to identify and then highlight the most relevant data for each patient. OBJECTIVES We used insights from literature and feedback from potential users to inform the design of an EMR display capable of highlighting relevant information. METHODS We used a review of relevant literature to guide the design of preliminary paper prototypes of the LEMR user interface. We observed five ICU physicians using their current EMR systems in preparation for morning rounds. Participants were interviewed and asked to explain their interactions and challenges with the EMR systems. Findings informed the revision of our prototypes. Finally, we conducted a focus group with five ICU physicians to elicit feedback on our designs and to generate ideas for our final prototypes using participatory design methods. RESULTS Participating physicians expressed support for the LEMR system. Identified design requirements included the display of data essential for every patient together with diagnosis-specific data and new or significantly changed information. Respondents expressed preferences for fishbones to organize labs, mouseovers to access additional details, and unobtrusive alerts minimizing color-coding. To address the concern about possible physician overreliance on highlighting, participants suggested that non-highlighted data should remain accessible. Study findings led to revised prototypes, which will inform the development of a functional user interface. CONCLUSION In the feedback we received, physicians supported pursuing the concept of a LEMR system. By introducing novel ways to support physicians' cognitive abilities, such a system has the potential to enhance physician EMR use and lead to better patient outcomes. Future plans include laboratory studies of both the utility of the proposed designs on decision-making, and the possible impact of any automation bias.
Collapse
|
17
|
King AJ, Cooper GF, Clermont G, Hochheiser H, Hauskrecht M, Sittig DF, Visweswaran S. Leveraging Eye Tracking to Prioritize Relevant Medical Record Data: Comparative Machine Learning Study. J Med Internet Res 2020; 22:e15876. [PMID: 32238342 PMCID: PMC7163414 DOI: 10.2196/15876] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 12/04/2019] [Accepted: 01/23/2020] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Electronic medical record (EMR) systems capture large amounts of data per patient and present that data to physicians with little prioritization. Without prioritization, physicians must mentally identify and collate relevant data, an activity that can lead to cognitive overload. To mitigate cognitive overload, a Learning EMR (LEMR) system prioritizes the display of relevant medical record data. Relevant data are those that are pertinent to a context-defined as the combination of the user, clinical task, and patient case. To determine which data are relevant in a specific context, a LEMR system uses supervised machine learning models of physician information-seeking behavior. Since obtaining information-seeking behavior data via manual annotation is slow and expensive, automatic methods for capturing such data are needed. OBJECTIVE The goal of the research was to propose and evaluate eye tracking as a high-throughput method to automatically acquire physician information-seeking behavior useful for training models for a LEMR system. METHODS Critical care medicine physicians reviewed intensive care unit patient cases in an EMR interface developed for the study. Participants manually identified patient data that were relevant in the context of a clinical task: preparing a patient summary to present at morning rounds. We used eye tracking to capture each physician's gaze dwell time on each data item (eg, blood glucose measurements). Manual annotations and gaze dwell times were used to define target variables for developing supervised machine learning models of physician information-seeking behavior. We compared the performance of manual selection and gaze-derived models on an independent set of patient cases. RESULTS A total of 68 pairs of manual selection and gaze-derived machine learning models were developed from training data and evaluated on an independent evaluation data set. A paired Wilcoxon signed-rank test showed similar performance of manual selection and gaze-derived models on area under the receiver operating characteristic curve (P=.40). CONCLUSIONS We used eye tracking to automatically capture physician information-seeking behavior and used it to train models for a LEMR system. The models that were trained using eye tracking performed like models that were trained using manual annotations. These results support further development of eye tracking as a high-throughput method for training clinical decision support systems that prioritize the display of relevant medical record data.
Collapse
|
18
|
Aronis JM, Ferraro JP, Gesteland PH, Tsui F, Ye Y, Wagner MM, Cooper GF. A Bayesian approach for detecting a disease that is not being modeled. PLoS One 2020; 15:e0229658. [PMID: 32109254 PMCID: PMC7048291 DOI: 10.1371/journal.pone.0229658] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 02/12/2020] [Indexed: 11/19/2022] Open
Abstract
Over the past decade, outbreaks of new or reemergent viruses such as severe acute respiratory syndrome (SARS) virus, Middle East respiratory syndrome (MERS) virus, and Zika have claimed thousands of lives and cost governments and healthcare systems billions of dollars. Because the appearance of new or transformed diseases is likely to continue, the detection and characterization of emergent diseases is an important problem. We describe a Bayesian statistical model that can detect and characterize previously unknown and unmodeled diseases from patient-care reports and evaluate its performance on historical data.
Collapse
|
19
|
Jabbari F, Villaruz LC, Davis M, Cooper GF. Lung Cancer Survival Prediction Using Instance-Specific Bayesian Networks. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
20
|
Tsui F, Ye Y, Ruiz V, Cooper GF, Wagner MM. Automated influenza case detection for public health surveillance and clinical diagnosis using dynamic influenza prevalence method. J Public Health (Oxf) 2019; 40:878-885. [PMID: 29059331 DOI: 10.1093/pubmed/fdx141] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Indexed: 11/13/2022] Open
Abstract
Objectives To assess the performance of a Bayesian case detector (BCD) for influenza surveillance and clinical diagnosis. Methods BCD uses a Bayesian network classifier to compute the posterior probability of a patient having influenza based on 31 findings from narrative clinical notes. To assess the potential for disease surveillance, we calculated area under the receiver operating characteristic curve (AUC) to indicate BCD's ability to differentiate between influenza and non-influenza encounters in emergency department settings. To assess the potential for clinical diagnosis, we measured AUC for diagnosing influenza cases among encounters having influenza-like illnesses. We also evaluated the performance of BCD using dynamically estimated influenza prevalence, and measured sensitivity, specificity and positive predictive value. Results For influenza surveillance, BCD differentiated between influenza and non-influenza encounters well with an AUC of 0.90 and 0.97 with dynamic influenza prevalence (P < 0.0001). For clinical diagnosis, the addition of dynamic influenza prevalence to BCD significantly improved AUC from 0.63 to 0.85 to distinguish influenza from other causes of influenza-like illness. Conclusions and policy implications BCD can serve as an influenza surveillance and a differential diagnosis tool via our dynamic prevalence approach. It enhances the communication between public health and clinical practice.
Collapse
|
21
|
King AJ, Cooper GF, Clermont G, Hochheiser H, Hauskrecht M, Sittig DF, Visweswaran S. Using machine learning to selectively highlight patient information. J Biomed Inform 2019; 100:103327. [PMID: 31676461 DOI: 10.1016/j.jbi.2019.103327] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Revised: 08/20/2019] [Accepted: 10/28/2019] [Indexed: 02/05/2023]
Abstract
BACKGROUND Electronic medical record (EMR) systems need functionality that decreases cognitive overload by drawing the clinician's attention to the right data, at the right time. We developed a Learning EMR (LEMR) system that learns statistical models of clinician information-seeking behavior and applies those models to direct the display of data in future patients. We evaluated the performance of the system in identifying relevant patient data in intensive care unit (ICU) patient cases. METHODS To capture information-seeking behavior, we enlisted critical care medicine physicians who reviewed a set of patient cases and selected data items relevant to the task of presenting at morning rounds. Using patient EMR data as predictors, we built machine learning models to predict their relevancy. We prospectively evaluated the predictions of a set of high performing models. RESULTS On an independent evaluation data set, 25 models achieved precision of 0.52, 95% CI [0.49, 0.54] and recall of 0.77, 95% CI [0.75, 0.80] in identifying relevant patient data items. For data items missed by the system, the reviewers rated the effect of not seeing those data from no impact to minor impact on patient care in about 82% of the cases. CONCLUSION Data-driven approaches for adaptively displaying data in EMR systems, like the LEMR system, show promise in using information-seeking behavior of clinicians to identify and highlight relevant patient data.
Collapse
|
22
|
Andrews B, Ramsey J, Cooper GF. Learning High-dimensional Directed Acyclic Graphs with Mixed Data-types. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2019; 104:4-21. [PMID: 31453569 PMCID: PMC6709674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In recent years, great strides have been made for causal structure learning in the high-dimensional setting and in the mixed data-type setting when there are both discrete and continuous variables. However, due to the complications involved with modeling continuous-discrete variable interactions, the intersection of these two settings has been relatively understudied. The current paper explores the problem of efficiently extending causal structure learning algorithms to high-dimensional data with mixed data-types. First, we characterize a model over continuous and discrete variables. Second, we derive a degenerate Gaussian (DG) score for mixed data-types and discuss its asymptotic properties. Lastly, we demonstrate the practicality of the DG score on learning causal structures from simulated data sets.
Collapse
|
23
|
Cai C, Cooper GF, Lu KN, Ma X, Xu S, Zhao Z, Chen X, Xue Y, Lee AV, Clark N, Chen V, Lu S, Chen L, Yu L, Hochheiser HS, Jiang X, Wang QJ, Lu X. Systematic discovery of the functional impact of somatic genome alterations in individual tumors through tumor-specific causal inference. PLoS Comput Biol 2019; 15:e1007088. [PMID: 31276486 PMCID: PMC6650088 DOI: 10.1371/journal.pcbi.1007088] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 07/23/2019] [Accepted: 05/09/2019] [Indexed: 02/07/2023] Open
Abstract
Cancer is mainly caused by somatic genome alterations (SGAs). Precision oncology involves identifying and targeting tumor-specific aberrations resulting from causative SGAs. We developed a novel tumor-specific computational framework that finds the likely causative SGAs in an individual tumor and estimates their impact on oncogenic processes, which suggests the disease mechanisms that are acting in that tumor. This information can be used to guide precision oncology. We report a tumor-specific causal inference (TCI) framework, which estimates causative SGAs by modeling causal relationships between SGAs and molecular phenotypes (e.g., transcriptomic, proteomic, or metabolomic changes) within an individual tumor. We applied the TCI algorithm to tumors from The Cancer Genome Atlas (TCGA) and estimated for each tumor the SGAs that causally regulate the differentially expressed genes (DEGs) in that tumor. Overall, TCI identified 634 SGAs that are predicted to cause cancer-related DEGs in a significant number of tumors, including most of the previously known drivers and many novel candidate cancer drivers. The inferred causal relationships are statistically robust and biologically sensible, and multiple lines of experimental evidence support the predicted functional impact of both the well-known and the novel candidate drivers that are predicted by TCI. TCI provides a unified framework that integrates multiple types of SGAs and molecular phenotypes to estimate which genome perturbations are causally influencing one or more molecular/cellular phenotypes in an individual tumor. By identifying major candidate drivers and revealing their functional impact in an individual tumor, TCI sheds light on the disease mechanisms of that tumor, which can serve to advance our basic knowledge of cancer biology and to support precision oncology that provides tailored treatment of individual tumors.
Collapse
|
24
|
King AJ, Cooper GF, Hochheiser H, Clermont G, Hauskrecht M, Visweswaran S. Using Machine Learning to Predict the Information Seeking Behavior of Clinicians Using an Electronic Medical Record System. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:673-682. [PMID: 30815109 PMCID: PMC6371238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Poor electronic medical record (EMR) usability is detrimental to both clinicians and patients. A better EMR would provide concise, context sensitive patient data, but doing so entails the difficult task of knowing which data are relevant. To determine the relevance of patient data in different contexts, we collect and model the information seeking behavior of clinicians using a learning EMR (LEMR) system. Sufficient data were collected to train predictive models for 80 different targets (e.g., glucose level, heparin administration) and 27 of them had AUROC values of greater than 0.7. These results are encouraging considering the high variation in information seeking behavior (intraclass correlation 0.40). We plan to apply these models to a new set of patient cases and adapt the LEMR interface to highlight relevant patient data, and thus provide concise, context sensitive data.
Collapse
|
25
|
Jabbari F, Visweswaran S, Cooper GF. Instance-Specific Bayesian Network Structure Learning. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2018; 72:169-180. [PMID: 30775723 PMCID: PMC6376975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Bayesian network (BN) structure learning algorithms are almost always designed to recover the structure that models the relationships that are shared by the instances in a population. While accurately learning such population-wide Bayesian networks is useful, learning Bayesian networks that are specific to each instance is often important as well. For example, to understand and treat a patient (instance), it is critical to understand the specific causal mechanisms that are operating in that particular patient. We introduce an instance-specific BN structure learning method that searches the space of Bayesian networks to build a model that is specific to an instance by guiding the search based on attributes of the given instance (e.g., patient symptoms, signs, lab results, and genotype). The structure discovery performance of the proposed method is compared to an existing state-of-the-art BN structure learning method, namely an implementation of the Greedy Equivalence Search algorithm called FGES, using both simulated and real data. The results show that the proposed method improves the precision of the model structure that is output, when compared to GES, especially for those variables that exhibit context-specific independence.
Collapse
|