551
|
Abstract
The development of high-throughput, data-intensive biomedical research assays and technologies has created a need for researchers to develop strategies for analyzing, integrating, and interpreting the massive amounts of data they generate. Although a wide variety of statistical methods have been designed to accommodate 'big data,' experiences with the use of artificial intelligence (AI) techniques suggest that they might be particularly appropriate. In addition, the results of the application of these assays reveal a great heterogeneity in the pathophysiologic factors and processes that contribute to disease, suggesting that there is a need to tailor, or 'personalize,' medicines to the nuanced and often unique features possessed by individual patients. Given how important data-intensive assays are to revealing appropriate intervention targets and strategies for treating an individual with a disease, AI can play an important role in the development of personalized medicines. We describe many areas where AI can play such a role and argue that AI's ability to advance personalized medicine will depend critically on not only the refinement of relevant assays, but also on ways of storing, aggregating, accessing, and ultimately integrating, the data they produce. We also point out the limitations of many AI techniques in developing personalized medicines as well as consider areas for further research.
Collapse
Affiliation(s)
- Nicholas J Schork
- Department of Quantitative Medicine, The Translational Genomics Research Institute (TGen), Phoenix, AZ, USA.
- The City of Hope/TGen IMPACT Center, Duarte, CA, USA.
- The University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
552
|
Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:139-153. [PMID: 29994486 PMCID: PMC6388621 DOI: 10.1109/tcbb.2018.2849968] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This article reviews recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping. NLP-based computational phenotyping has numerous applications including diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI), and adverse drug event (ADE) detection, as well as genome-wide and phenome-wide association studies. Significant progress has been made in algorithm development and resource construction for computational phenotyping. Among the surveyed methods, well-designed keyword search and rule-based systems often achieve good performance. However, the construction of keyword and rule lists requires significant manual effort, which is difficult to scale. Supervised machine learning models have been favored because they are capable of acquiring both classification patterns and structures from data. Recently, deep learning and unsupervised learning have received growing attention, with the former favored for its performance and the latter for its ability to find novel phenotypes. Integrating heterogeneous data sources have become increasingly important and have shown promise in improving model performance. Often, better performance is achieved by combining multiple modalities of information. Despite these many advances, challenges and opportunities remain for NLP-based computational phenotyping, including better model interpretability and generalizability, and proper characterization of feature relations in clinical narratives.
Collapse
Affiliation(s)
- Zexian Zeng
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| | - Yu Deng
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| | - Xiaoyu Li
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA 02115.
| | - Tristan Naumann
- Science and Artificial Intelligence Lab, Massachusetts Institue of Technology, Cambridge, MA 02139.
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| |
Collapse
|
553
|
Rafiq M, Keel G, Mazzocato P, Spaak J, Savage C, Guttmann C. Deep Learning Architectures for Vector Representations of Patients and Exploring Predictors of 30-Day Hospital Readmissions in Patients with Multiple Chronic Conditions. LECTURE NOTES IN COMPUTER SCIENCE 2019. [DOI: 10.1007/978-3-030-12738-1_17] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
554
|
Prosperi M, Min JS, Bian J, Modave F. Big data hurdles in precision medicine and precision public health. BMC Med Inform Decis Mak 2018; 18:139. [PMID: 30594159 PMCID: PMC6311005 DOI: 10.1186/s12911-018-0719-2] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 12/04/2018] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Nowadays, trendy research in biomedical sciences juxtaposes the term 'precision' to medicine and public health with companion words like big data, data science, and deep learning. Technological advancements permit the collection and merging of large heterogeneous datasets from different sources, from genome sequences to social media posts or from electronic health records to wearables. Additionally, complex algorithms supported by high-performance computing allow one to transform these large datasets into knowledge. Despite such progress, many barriers still exist against achieving precision medicine and precision public health interventions for the benefit of the individual and the population. MAIN BODY The present work focuses on analyzing both the technical and societal hurdles related to the development of prediction models of health risks, diagnoses and outcomes from integrated biomedical databases. Methodological challenges that need to be addressed include improving semantics of study designs: medical record data are inherently biased, and even the most advanced deep learning's denoising autoencoders cannot overcome the bias if not handled a priori by design. Societal challenges to face include evaluation of ethically actionable risk factors at the individual and population level; for instance, usage of gender, race, or ethnicity as risk modifiers, not as biological variables, could be replaced by modifiable environmental proxies such as lifestyle and dietary habits, household income, or access to educational resources. CONCLUSIONS Data science for precision medicine and public health warrants an informatics-oriented formalization of the study design and interoperability throughout all levels of the knowledge inference process, from the research semantics, to model development, and ultimately to implementation.
Collapse
Affiliation(s)
- Mattia Prosperi
- Department of Epidemiology, College of Medicine & College of Public Health and Health Professions, University of Florida, Gainesville, FL, 32610, USA.
| | - Jae S Min
- Department of Epidemiology, College of Medicine & College of Public Health and Health Professions, University of Florida, Gainesville, FL, 32610, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, 32610, USA
| | - François Modave
- Center for Health Outcomes and Informatics Research, Loyola University Chicago, Maywood, IL, 60153, USA
| |
Collapse
|
555
|
Abstract
Asthma is among the most common chronic diseases worldwide and is a significant contributor to the global health burden, highlighting the urgent need for primary prevention. This article outlines several practical and conceptual challenges that accompany primary prevention efforts. It advocates for improved predictive modeling to identify those at high-risk of developing asthma using automated algorithms within electronic medical records systems and explanatory modeling to refine understanding of causal pathways. Understanding the many issues that are likely to affect the success of primary prevention efforts helps the community of individuals invested in asthma prevention organize efforts and maximize their impact.
Collapse
|
556
|
Kennedy G, Gallego B. Clinical prediction rules: A systematic review of healthcare provider opinions and preferences. Int J Med Inform 2018; 123:1-10. [PMID: 30654898 DOI: 10.1016/j.ijmedinf.2018.12.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Revised: 10/29/2018] [Accepted: 12/11/2018] [Indexed: 12/23/2022]
Abstract
OBJECTIVE The act of predicting clinical endpoints and patient trajectories based on past and current states is on the precipice of a technological revolution. This systematic review summarises the available evidence describing healthcare provider opinions and preferences with respect to the use of clinical prediction rules. The primary goal of this work is to inform the design and implementation of future systems, and secondarily to identify gaps for the development of clinician education programs. METHODS Five databases were systematically searched in May 2016 for studies collecting empirical opinions of healthcare providers regarding clinical prediction rule usage. Reference lists were scanned for additional eligible materials and an update search was made in August 2017. Data was extracted on high-level study features, before in-depth thematic analysis was performed. RESULTS 45 articles were identified from 9 countries. Most studies utilised surveys (28) or interviews (14). Fewer employed focus groups (9) or formal usability testing (4). Three high-level themes were identified, which form the basis of healthcare provider opinions of clinical prediction rules and their implementation - utility, credibility and usability. CONCLUSIONS Some of the objections and preferences stated by healthcare providers are inherent to the nature of the clinical problem addressed, which may or may not be within the designer's capacity to change; however, others (in particular - actionability, validation, integration and provision of high quality education materials) should be considered by prediction rule designers and implementation teams, in order to increase user acceptance and improve uptake of these tools. We summarise these findings across the clinical prediction rule lifecycle and pose questions for the rule developers, in order to produce tools that are more likely to successfully translate into clinical practice.
Collapse
Affiliation(s)
- Georgina Kennedy
- Australian Institute of Health Innovation, Macquarie University, 75 Talavera Road, Sydney 2113, Australia.
| | - Blanca Gallego
- Australian Institute of Health Innovation, Macquarie University, 75 Talavera Road, Sydney 2113, Australia
| |
Collapse
|
557
|
Roca J, Tenyi A, Cano I. Paradigm changes for diagnosis: using big data for prediction. ACTA ACUST UNITED AC 2018; 57:317-327. [DOI: 10.1515/cclm-2018-0971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/21/2018] [Indexed: 11/15/2022]
Abstract
Abstract
Due to profound changes occurring in biomedical knowledge and in health systems worldwide, an entirely new health and social care scenario is emerging. Moreover, the enormous technological potential developed over the last years is increasingly influencing life sciences and driving changes toward personalized medicine and value-based healthcare. However, the current slow progression of adoption, limiting the generation of healthcare efficiencies through technological innovation, can be realistically overcome by fostering convergence between a systems medicine approach and the principles governing Integrated Care. Implicit with this strategy is the multidisciplinary active collaboration of all stakeholders involved in the change, namely: citizens, professionals with different profiles, academia, policy makers, industry and payers. The article describes the key building blocks of an open and collaborative hub currently being developed in Catalonia (Spain) aiming at generation, deployment and evaluation of a personalized medicine program addressing highly prevalent chronic conditions that often show co-occurrence, namely: cardiovascular disorders, chronic obstructive pulmonary disease, type 2 diabetes mellitus; metabolic syndrome and associated mental disturbances (anxiety-depression and altered behavioral patterns leading to unhealthy life styles).
Collapse
Affiliation(s)
- Josep Roca
- Hospital Clínic, IDIBAPS, Facultat de Medicina , Universitat de Barcelona , Barcelona, Catalunya , Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES) , Av. Monforte de Lemos, 3-5. Pabellón 11. Planta 0 , 28029, Madrid, Catalunya , Spain , Phone: +34-932275747, Fax: +34-932275455
| | - Akos Tenyi
- Hospital Clínic, IDIBAPS, Facultat de Medicina , Universitat de Barcelona , Barcelona, Catalunya , Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES) , Madrid, Catalunya , Spain
| | - Isaac Cano
- Hospital Clínic, IDIBAPS, Facultat de Medicina , Universitat de Barcelona , Barcelona, Catalunya , Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES) , Madrid, Catalunya , Spain
| |
Collapse
|
558
|
LLTO: Towards efficient lesion localization based on template occlusion strategy in intelligent diagnosis. Pattern Recognit Lett 2018. [DOI: 10.1016/j.patrec.2018.10.029] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
559
|
Mao C, Pan Y, Zeng Z, Yao L, Luo Y. Deep Generative Classifiers for Thoracic Disease Diagnosis with Chest X-ray Images. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2018; 2018:1209-1214. [PMID: 31341701 PMCID: PMC6651749 DOI: 10.1109/bibm.2018.8621107] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Thoracic diseases are very serious health problems that plague a large number of people. Chest X-ray is currently one of the most popular methods to diagnose thoracic diseases, playing an important role in the healthcare workflow. However, reading the chest X-ray images and giving an accurate diagnosis remain challenging tasks for expert radiologists. With the success of deep learning in computer vision, a growing number of deep neural network architectures were applied to chest X-ray image classification. However, most of the previous deep neural network classifiers were based on deterministic architectures which are usually very noise-sensitive and are likely to aggravate the overfitting issue. In this paper, to make a deep architecture more robust to noise and to reduce overfitting, we propose using deep generative classifiers to automatically diagnose thorax diseases from the chest X-ray images. Unlike the traditional deterministic classifier, a deep generative classifier has a distribution middle layer in the deep neural network. A sampling layer then draws a random sample from the distribution layer and input it to the following layer for classification. The classifier is generative because the class label is generated from samples of a related distribution. Through training the model with a certain amount of randomness, the deep generative classifiers are expected to be robust to noise and can reduce overfitting and then achieve good performances. We implemented our deep generative classifiers based on a number of well-known deterministic neural network architectures, and tested our models on the chest X-ray14 dataset. The results demonstrated the superiority of deep generative classifiers compared with the corresponding deep deterministic classifiers.
Collapse
Affiliation(s)
- Chengsheng Mao
- Dept. of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Yiheng Pan
- Dept. of EECS, Northwestern University , Chicago, IL, USA
| | - Zexian Zeng
- Dept. of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Liang Yao
- Dept. of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Yuan Luo
- Dept. of Preventive Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
560
|
Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD, Morley K, Osborn D, Hayes J, Stewart R, Downs J, Chapman W, Dutta R. Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. J Biomed Inform 2018; 88:11-19. [PMID: 30368002 PMCID: PMC6986921 DOI: 10.1016/j.jbi.2018.10.005] [Citation(s) in RCA: 107] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 10/14/2018] [Accepted: 10/15/2018] [Indexed: 12/27/2022]
Abstract
The importance of incorporating Natural Language Processing (NLP) methods in clinical informatics research has been increasingly recognized over the past years, and has led to transformative advances. Typically, clinical NLP systems are developed and evaluated on word, sentence, or document level annotations that model specific attributes and features, such as document content (e.g., patient status, or report type), document section types (e.g., current medications, past medical history, or discharge summary), named entities and concepts (e.g., diagnoses, symptoms, or treatments) or semantic attributes (e.g., negation, severity, or temporality). From a clinical perspective, on the other hand, research studies are typically modelled and evaluated on a patient- or population-level, such as predicting how a patient group might respond to specific treatments or patient monitoring over time. While some NLP tasks consider predictions at the individual or group user level, these tasks still constitute a minority. Owing to the discrepancy between scientific objectives of each field, and because of differences in methodological evaluation priorities, there is no clear alignment between these evaluation approaches. Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa. A particular focus is placed on mental health research, an area still relatively understudied by the clinical NLP research community, but where NLP methods are of notable relevance. Recent advances in clinical NLP method development have been significant, but we propose more emphasis needs to be placed on rigorous evaluation for the field to advance further. To enable this, we provide actionable suggestions, including a minimal protocol that could be used when reporting clinical NLP method development and its evaluation.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden.
| | - Hanna Suominen
- College of Engineering and Computer Science, The Australian National University, Data61/CSIRO, University of Canberra, Australia; University of Turku, Finland.
| | - Maria Liakata
- Department of Computer Science, University of Warwick/Alan Turing Institute, UK.
| | - Angus Roberts
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK.
| | - Anoop D Shah
- Institute of Health Informatics, University College London, UK; University College London NHS Foundation Trust, London, UK.
| | - Katherine Morley
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; Melbourne School of Population and Global Health, The University of Melbourne, Australia.
| | - David Osborn
- Division of Psychiatry, University College London, UK; Camden and Islington NHS Foundation Trust, London, UK.
| | - Joseph Hayes
- Division of Psychiatry, University College London, UK; Camden and Islington NHS Foundation Trust, London, UK.
| | - Robert Stewart
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| | - Johnny Downs
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| | - Wendy Chapman
- Department of Biomedical Informatics, University of Utah, United States.
| | - Rina Dutta
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| |
Collapse
|
561
|
Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 2018; 19:1236-1246. [PMID: 28481991 PMCID: PMC6455466 DOI: 10.1093/bib/bbx044] [Citation(s) in RCA: 825] [Impact Index Per Article: 117.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 02/19/2017] [Indexed: 02/07/2023] Open
Abstract
Gaining knowledge and actionable insights from complex, high-dimensional and heterogeneous biomedical data remains a key challenge in transforming health care. Various types of data have been emerging in modern biomedical research, including electronic health records, imaging, -omics, sensor data and text, which are complex, heterogeneous, poorly annotated and generally unstructured. Traditional data mining and statistical learning approaches typically need to first perform feature engineering to obtain effective and more robust features from those data, and then build prediction or clustering models on top of them. There are lots of challenges on both steps in a scenario of complicated data and lacking of sufficient domain knowledge. The latest advances in deep learning technologies provide new effective paradigms to obtain end-to-end learning models from complex data. In this article, we review the recent literature on applying deep learning technologies to advance the health care domain. Based on the analyzed work, we suggest that deep learning approaches could be the vehicle for translating big biomedical data into improved human health. However, we also note limitations and needs for improved methods development and applications, especially in terms of ease-of-understanding for domain experts and citizen scientists. We discuss such challenges and suggest developing holistic and meaningful interpretable architectures to bridge deep learning models and human interpretability.
Collapse
Affiliation(s)
- Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY
| | - Fei Wang
- Division of Health Informatics, Department of Healthcare Policy and Research at Weill Cornell Medicine at Cornell University, New York, NY
| | - Shuang Wang
- Department of Biomedical Informatics at the University of California San Diego, La Jolla, CA
| | - Xiaoqian Jiang
- Department of Biomedical Informatics at the University of California San Diego, La Jolla, CA
| | - Joel T Dudley
- the Institute for Next Generation Healthcare and associate professor in the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
562
|
Chin CY, Hsieh SY, Tseng VS. eDRAM: Effective early disease risk assessment with matrix factorization on a large-scale medical database: A case study on rheumatoid arthritis. PLoS One 2018; 13:e0207579. [PMID: 30475847 PMCID: PMC6261027 DOI: 10.1371/journal.pone.0207579] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 11/02/2018] [Indexed: 11/18/2022] Open
Abstract
Recently, a number of analytical approaches for probing medical databases have been developed to assist in disease risk assessment and to determine the association of a clinical condition with others, so that better and intelligent healthcare can be provided. The early assessment of disease risk is an emerging topic in medical informatics. If diseases are detected at an early stage, prognosis can be improved and medical resources can be used more efficiently. For example, if rheumatoid arthritis (RA) is detected at an early stage, appropriate medications can be used to prevent bone deterioration. In early disease risk assessment, finding important risk factors from large-scale medical databases and performing individual disease risk assessment have been challenging tasks. A number of recent studies have considered risk factor analysis approaches, such as association rule mining, sequential rule mining, regression, and expert advice. In this study, to improve disease risk assessment, machine learning and matrix factorization techniques were integrated to discover important and implicit risk factors. A novel framework is proposed that can effectively assess early disease risks, and RA is used as a case study. This framework comprises three main stages: data preprocessing, risk factor optimization, and early disease risk assessment. This is the first study integrating matrix factorization and machine learning for disease risk assessment that is applied to a nation-wide and longitudinal medical diagnostic database. In the experimental evaluations, a cohort established from a large-scale medical database was used that included 1007 RA-diagnosed patients and 921,192 control patients examined over a nine-year follow-up period (2000-2008). The evaluation results demonstrate that the proposed approach is more efficient and stable for disease risk assessment than state-of-the-art methods.
Collapse
Affiliation(s)
- Chu-Yu Chin
- Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Sun-Yuan Hsieh
- Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Vincent S. Tseng
- Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan
- * E-mail:
| |
Collapse
|
563
|
Feature Ranking in Predictive Models for Hospital-Acquired Acute Kidney Injury. Sci Rep 2018; 8:17298. [PMID: 30470779 PMCID: PMC6251919 DOI: 10.1038/s41598-018-35487-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 11/02/2018] [Indexed: 12/22/2022] Open
Abstract
Acute Kidney Injury (AKI) is a common complication encountered among hospitalized patients, imposing significantly increased cost, morbidity, and mortality. Early prediction of AKI has profound clinical implications because currently no treatment exists for AKI once it develops. Feature selection (FS) is an essential process for building accurate and interpretable prediction models, but to our best knowledge no study has investigated the robustness and applicability of such selection process for AKI. In this study, we compared eight widely-applied FS methods for AKI prediction using nine-years of electronic medical records (EMR) and examined heterogeneity in feature rankings produced by the methods. FS methods were compared in terms of stability with respect to data sampling variation, similarity between selection results, and AKI prediction performance. Prediction accuracy did not intrinsically guarantee the feature ranking stability. Across different FS methods, the prediction performance did not change significantly, while the importance rankings of features were quite different. A positive correlation was observed between the complexity of suitable FS method and sample size. This study provides several practical implications, including recognizing the importance of feature stability as it is desirable for model reproducibility, identifying important AKI risk factors for further investigation, and facilitating early prediction of AKI.
Collapse
|
564
|
Parbhoo S, Gottesman O, Ross AS, Komorowski M, Faisal A, Bon I, Roth V, Doshi-Velez F. Improving counterfactual reasoning with kernelised dynamic mixing models. PLoS One 2018; 13:e0205839. [PMID: 30419029 PMCID: PMC6231902 DOI: 10.1371/journal.pone.0205839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 09/10/2018] [Indexed: 11/18/2022] Open
Abstract
Simulation-based approaches to disease progression allow us to make counterfactual predictions about the effects of an untried series of treatment choices. However, building accurate simulators of disease progression is challenging, limiting the utility of these approaches for real world treatment planning. In this work, we present a novel simulation-based reinforcement learning approach that mixes between models and kernel-based approaches to make its forward predictions. On two real world tasks, managing sepsis and treating HIV, we demonstrate that our approach both learns state-of-the-art treatment policies and can make accurate forward predictions about the effects of treatments on unseen patients.
Collapse
Affiliation(s)
- Sonali Parbhoo
- Department of Mathematics and Informatics, University of Basel, Basel, Switzerland
| | - Omer Gottesman
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachussets, United States of America
| | - Andrew Slavin Ross
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachussets, United States of America
| | | | - Aldo Faisal
- Department of Bioengineering, Imperial College, London, United Kingdom
| | - Isabella Bon
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy
| | - Volker Roth
- Department of Mathematics and Informatics, University of Basel, Basel, Switzerland
| | - Finale Doshi-Velez
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachussets, United States of America
| |
Collapse
|
565
|
Abstract
Inexpensive embedded computing and the related Internet of Things technologies enable the recent development of smart products that can respond to human needs and improve everyday tasks in an attempt to make traditional environments more “intelligent”. Several projects have augmented mirrors for a range of smarter applications in automobiles and homes. The opportunity to apply smart mirror technology to healthcare to predict and to monitor aspects of health and disease is a natural but mostly underdeveloped idea. We envision that smart mirrors comprising a combination of intelligent hardware and software could identify subtle, yet clinically relevant changes in physique and appearance. Similarly, a smart mirror could record and evaluate body position and motion to identify posture and movement issues, as well as offer feedback for corrective actions. Successful development and implementation of smart mirrors for healthcare applications will require overcoming new challenges in engineering, machine learning, computer vision, and biomedical research. This paper examines the potential uses of smart mirrors in healthcare and explores how this technology might benefit users in various medical environments. We also provide a brief description of the state-of-the-art, including a functional prototype concept developed by our group, and highlight the directions to make this device more mainstream in health-related applications.
Collapse
|
566
|
Deep diagnostics and prognostics: An integrated hierarchical learning framework in PHM applications. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.01.036] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
567
|
Acharya UR, Raghavendra U, Koh JEW, Meiburger KM, Ciaccio EJ, Hagiwara Y, Molinari F, Leong WL, Vijayananthan A, Yaakup NA, Fabell MKBM, Yeong CH. Automated detection and classification of liver fibrosis stages using contourlet transform and nonlinear features. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 166:91-98. [PMID: 30415722 DOI: 10.1016/j.cmpb.2018.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 08/24/2018] [Accepted: 10/01/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND AND OBJECTIVE Liver fibrosis is a type of chronic liver injury that is characterized by an excessive deposition of extracellular matrix protein. Early detection of liver fibrosis may prevent further growth toward liver cirrhosis and hepatocellular carcinoma. In the past, the only method to assess liver fibrosis was through biopsy, but this examination is invasive, expensive, prone to sampling errors, and may cause complications such as bleeding. Ultrasound-based elastography is a promising tool to measure tissue elasticity in real time; however, this technology requires an upgrade of the ultrasound system and software. In this study, a novel computer-aided diagnosis tool is proposed to automatically detect and classify the various stages of liver fibrosis based upon conventional B-mode ultrasound images. METHODS The proposed method uses a 2D contourlet transform and a set of texture features that are efficiently extracted from the transformed image. Then, the combination of a kernel discriminant analysis (KDA)-based feature reduction technique and analysis of variance (ANOVA)-based feature ranking technique was used, and the images were then classified into various stages of liver fibrosis. RESULTS Our 2D contourlet transform and texture feature analysis approach achieved a 91.46% accuracy using only four features input to the probabilistic neural network classifier, to classify the five stages of liver fibrosis. It also achieved a 92.16% sensitivity and 88.92% specificity for the same model. The evaluation was done on a database of 762 ultrasound images belonging to five different stages of liver fibrosis. CONCLUSIONS The findings suggest that the proposed method can be useful to automatically detect and classify liver fibrosis, which would greatly assist clinicians in making an accurate diagnosis.
Collapse
Affiliation(s)
- U Rajendra Acharya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Clementi 599489, Singapore; Department of Biomedical Engineering, School of Science and Technology, Singapore University of Social Sciences, Clementi 599491, Singapore; School of Medicine, Faculty of Health and Medical Sciences, Taylor's University, 47500 Subang Jaya, Malaysia
| | - U Raghavendra
- Department of Instrumentation and Control Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India
| | - Joel E W Koh
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Clementi 599489, Singapore
| | - Kristen M Meiburger
- Department of Electronics and Telecommunications, Politecnico di Torino, Corso Duca degli Abruzzi, 24, 10129 Torino, Italy.
| | - Edward J Ciaccio
- Department of Medicine, Columbia University, New York, NY, 10032, USA
| | - Yuki Hagiwara
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Clementi 599489, Singapore
| | - Filippo Molinari
- Department of Electronics and Telecommunications, Politecnico di Torino, Corso Duca degli Abruzzi, 24, 10129 Torino, Italy
| | - Wai Ling Leong
- Department of Biomedical Imaging, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Anushya Vijayananthan
- Department of Biomedical Imaging, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Nur Adura Yaakup
- Department of Biomedical Imaging, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Mohd Kamil Bin Mohd Fabell
- Department of Biomedical Imaging, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Chai Hong Yeong
- Department of Biomedical Imaging, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia; School of Medicine, Faculty of Health and Medical Sciences, Taylor's University, 47500 Subang Jaya, Malaysia
| |
Collapse
|
568
|
Coulet A, Shah NH, Wack M, Chawki MB, Jay N, Dumontier M. Predicting the need for a reduced drug dose, at first prescription. Sci Rep 2018; 8:15558. [PMID: 30349060 PMCID: PMC6197198 DOI: 10.1038/s41598-018-33980-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 10/06/2018] [Indexed: 01/21/2023] Open
Abstract
Prescribing the right drug with the right dose is a central tenet of precision medicine. We examined the use of patients’ prior Electronic Health Records to predict a reduction in drug dosage. We focus on drugs that interact with the P450 enzyme family, because their dosage is known to be sensitive and variable. We extracted diagnostic codes, conditions reported in clinical notes, and laboratory orders from Stanford’s clinical data warehouse to construct cohorts of patients that either did or did not need a dose change. After feature selection, we trained models to predict the patients who will (or will not) require a dose change after being prescribed one of 34 drugs across 23 drug classes. Overall, we can predict (AUC ≥ 0.70–0.95) a dose reduction for 23 drugs and 22 drug classes. Several of these drugs are associated with clinical guidelines that recommend dose reduction exclusively in the case of adverse reaction. For these cases, a reduction in dosage may be considered as a surrogate for an adverse reaction, which our system could indirectly help predict and prevent. Our study illustrates the role machine learning may take in providing guidance in setting the starting dose for drugs associated with response variability.
Collapse
Affiliation(s)
- Adrien Coulet
- Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France. .,Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| | - Maxime Wack
- Service d'Evaluation et d'Information Médicales, University Hospital of Nancy (CHRU), Nancy, France
| | - Mohammad B Chawki
- Service d'Evaluation et d'Information Médicales, University Hospital of Nancy (CHRU), Nancy, France
| | - Nicolas Jay
- Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France.,Service d'Evaluation et d'Information Médicales, University Hospital of Nancy (CHRU), Nancy, France
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.,Institute of Data Science, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
569
|
De Silva D, Ranasinghe W, Bandaragoda T, Adikari A, Mills N, Iddamalgoda L, Alahakoon D, Lawrentschuk N, Persad R, Osipov E, Gray R, Bolton D. Machine learning to support social media empowered patients in cancer care and cancer treatment decisions. PLoS One 2018; 13:e0205855. [PMID: 30335805 PMCID: PMC6193663 DOI: 10.1371/journal.pone.0205855] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 09/25/2018] [Indexed: 12/15/2022] Open
Abstract
Background A primary variant of social media, online support groups (OSG) extend beyond the standard definition to incorporate a dimension of advice, support and guidance for patients. OSG are complementary, yet significant adjunct to patient journeys. Machine learning and natural language processing techniques can be applied to these large volumes of unstructured text discussions accumulated in OSG for intelligent extraction of patient-reported demographics, behaviours, decisions, treatment, side effects and expressions of emotions. New insights from the fusion and synthesis of such diverse patient-reported information, as expressed throughout the patient journey from diagnosis to treatment and recovery, can contribute towards informed decision-making on personalized healthcare delivery and the development of healthcare policy guidelines. Methods and findings We have designed and developed an artificial intelligence based analytics framework using machine learning and natural language processing techniques for intelligent analysis and automated aggregation of patient information and interaction trajectories in online support groups. Alongside the social interactions aspect, patient behaviours, decisions, demographics, clinical factors, emotions, as subsequently expressed over time, are extracted and analysed. More specifically, we utilised this platform to investigate the impact of online social influences on the intimate decision scenario of selecting a treatment type, recovery after treatment, side effects and emotions expressed over time, using prostate cancer as a model. Results manifest the three major decision-making behaviours among patients, Paternalistic group, Autonomous group and Shared group. Furthermore, each group demonstrated diverse behaviours in post-decision discussions on clinical outcomes, advice and expressions of emotion during the twelve months following treatment. Over time, the transition of patients from information and emotional support seeking behaviours to providers of information and emotional support to other patients was also observed. Conclusions Findings from this study are a rigorous indication of the expectations of social media empowered patients, their potential for individualised decision-making, clinical and emotional needs. The increasing popularity of OSG further confirms that it is timely for clinicians to consider patient voices as expressed in OSG. We have successfully demonstrated that the proposed platform can be utilised to investigate, analyse and derive actionable insights from patient-reported information on prostate cancer, in support of patient focused healthcare delivery. The platform can be extended and applied just as effectively to any other medical condition.
Collapse
Affiliation(s)
- Daswin De Silva
- Research Centre for Data Analytics and Cognition, La Trobe University, Victoria, Australia
- * E-mail:
| | - Weranja Ranasinghe
- Research Centre for Data Analytics and Cognition, La Trobe University, Victoria, Australia
- Austin Hospital, Heidelberg, Victoria, Australia
| | - Tharindu Bandaragoda
- Research Centre for Data Analytics and Cognition, La Trobe University, Victoria, Australia
| | - Achini Adikari
- Research Centre for Data Analytics and Cognition, La Trobe University, Victoria, Australia
| | - Nishan Mills
- Research Centre for Data Analytics and Cognition, La Trobe University, Victoria, Australia
| | - Lahiru Iddamalgoda
- Research Centre for Data Analytics and Cognition, La Trobe University, Victoria, Australia
| | - Damminda Alahakoon
- Research Centre for Data Analytics and Cognition, La Trobe University, Victoria, Australia
| | | | - Raj Persad
- North Bristol, NHS Trust, Bristol, United Kingdom
| | - Evgeny Osipov
- Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, Luleå, Sweden
| | - Richard Gray
- School of Nursing and Midwifery, La Trobe University, Victoria, Australia
| | | |
Collapse
|
570
|
Leslie HH, Zhou X, Spiegelman D, Kruk ME. Health system measurement: Harnessing machine learning to advance global health. PLoS One 2018; 13:e0204958. [PMID: 30289935 PMCID: PMC6173424 DOI: 10.1371/journal.pone.0204958] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 09/15/2018] [Indexed: 11/21/2022] Open
Abstract
Background Further improvements in population health in low- and middle-income countries demand high-quality care to address an increasingly complex burden of disease. Health facility surveys provide an important but costly source of information on readiness to provide care. To improve the efficiency of health system measurement, we applied unsupervised machine learning methods to assess the performance of the service readiness index (SRI) defined by the World Health Organization and compared it to empirically derived indices. Methods We drew data from nationally representative Service Provision Assessment surveys conducted in 10 countries between 2007 and 2015. We extracted 649 items in domains such as infrastructure, medication, and management to calculate an index using all available information and classified facilities into quintiles. We compared three approaches against the full item set: the SRI, a new index based on sequential backward selection, and an enriched SRI that added empirically selected items to the SRI. We evaluated index performance with a cross-validated kappa statistic comparing classification using the candidate index against the 649-item index. Results 9238 facilities were assessed. The 49-item SRI performed poorly against the index using all 649 items, with a kappa value of 0.35. New empirically derived indices with 50 and 100 items captured much more information, with cross-validated kappa statistics of 0.71 and 0.80, respectively. Items varied across the indices and in sensitivity analyses. A 100-item enriched SRI reliably captured the information from the full index: 83% of the facilities were classified into correct quintiles of service readiness based on the full index. Conclusion A facility readiness measure developed by global health experts performed poorly in capturing the totality of readiness information collected during facility surveys. Using a machine learning approach with sequential selection and cross-validation to identify the most informative items dramatically improved performance. Such approaches can make assessment of health facility readiness more efficient. Further improvements in measurement will require identification of external criteria—such as patient outcomes—to guide and validate measure development.
Collapse
Affiliation(s)
- Hannah H. Leslie
- Department of Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| | - Xin Zhou
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Donna Spiegelman
- Department of Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Center on Methods for Implementation and Prevention Science, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Margaret E. Kruk
- Department of Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
571
|
Kosti I, Sirota M. Electronic Medical Records Enable Precision Medicine Approaches for Celiac Disease. J Pediatr Gastroenterol Nutr 2018; 67:434-435. [PMID: 29746345 PMCID: PMC6150815 DOI: 10.1097/mpg.0000000000002021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Affiliation(s)
- Idit Kosti
- Institute for Computational Health Sciences
- Department of Pediatrics, University of California, San Francisco, CA
| | - Marina Sirota
- Institute for Computational Health Sciences
- Department of Pediatrics, University of California, San Francisco, CA
| |
Collapse
|
572
|
Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc 2018; 25:1419-1428. [PMID: 29893864 PMCID: PMC6188527 DOI: 10.1093/jamia/ocy068] [Citation(s) in RCA: 295] [Impact Index Per Article: 42.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Revised: 05/01/2018] [Accepted: 05/08/2018] [Indexed: 12/14/2022] Open
Abstract
Objective To conduct a systematic review of deep learning models for electronic health record (EHR) data, and illustrate various deep learning architectures for analyzing different data sources and their target applications. We also highlight ongoing research and identify open challenges in building deep learning models of EHRs. Design/method We searched PubMed and Google Scholar for papers on deep learning studies using EHR data published between January 1, 2010, and January 31, 2018. We summarize them according to these axes: types of analytics tasks, types of deep learning model architectures, special challenges arising from health data and tasks and their potential solutions, as well as evaluation strategies. Results We surveyed and analyzed multiple aspects of the 98 articles we found and identified the following analytics tasks: disease detection/classification, sequential prediction of clinical events, concept embedding, data augmentation, and EHR data privacy. We then studied how deep architectures were applied to these tasks. We also discussed some special challenges arising from modeling EHR data and reviewed a few popular approaches. Finally, we summarized how performance evaluations were conducted for each task. Discussion Despite the early success in using deep learning for health analytics applications, there still exist a number of issues to be addressed. We discuss them in detail including data and label availability, the interpretability and transparency of the model, and ease of deployment.
Collapse
Affiliation(s)
- Cao Xiao
- AI for Healthcare, IBM Research, Cambridge, Massachusetts, USA
| | - Edward Choi
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Jimeng Sun
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| |
Collapse
|
573
|
Smoller JW. The use of electronic health records for psychiatric phenotyping and genomics. Am J Med Genet B Neuropsychiatr Genet 2018; 177:601-612. [PMID: 28557243 PMCID: PMC6440216 DOI: 10.1002/ajmg.b.32548] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2017] [Accepted: 04/20/2017] [Indexed: 12/22/2022]
Abstract
The widespread adoption of electronic health record (EHRs) in healthcare systems has created a vast and continuously growing resource of clinical data and provides new opportunities for population-based research. In particular, the linking of EHRs to biospecimens and genomic data in biobanks may help address what has become a rate-limiting study for genetic research: the need for large sample sizes. The principal roadblock to capitalizing on these resources is the need to establish the validity of phenotypes extracted from the EHR. For psychiatric genetic research, this represents a particular challenge given that diagnosis is based on patient reports and clinician observations that may not be well-captured in billing codes or narrative records. This review addresses the opportunities and pitfalls in EHR-based phenotyping with a focus on their application to psychiatric genetic research. A growing number of studies have demonstrated that diagnostic algorithms with high positive predictive value can be derived from EHRs, especially when structured data are supplemented by text mining approaches. Such algorithms enable semi-automated phenotyping for large-scale case-control studies. In addition, the scale and scope of EHR databases have been used successfully to identify phenotypic subgroups and derive algorithms for longitudinal risk prediction. EHR-based genomics are particularly well-suited to rapid look-up replication of putative risk genes, studies of pleiotropy (phenomewide association studies or PheWAS), investigations of genetic networks and overlap across the phenome, and pharmacogenomic research. EHR phenotyping has been relatively under-utilized in psychiatric genomic research but may become a key component of efforts to advance precision psychiatry.
Collapse
Affiliation(s)
- Jordan W. Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA
| |
Collapse
|
574
|
Digital diabetes: Perspectives for diabetes prevention, management and research. DIABETES & METABOLISM 2018; 45:322-329. [PMID: 30243616 DOI: 10.1016/j.diabet.2018.08.012] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 08/22/2018] [Accepted: 08/27/2018] [Indexed: 12/20/2022]
Abstract
Digital medicine, digital research and artificial intelligence (AI) have the power to transform the field of diabetes with continuous and no-burden remote monitoring of patients' symptoms, physiological data, behaviours, and social and environmental contexts through the use of wearables, sensors and smartphone technologies. Moreover, data generated online and by digital technologies - which the authors suggest be grouped under the term 'digitosome' - constitute, through the quantity and variety of information they represent, a powerful potential for identifying new digital markers and patterns of risk that, ultimately, when combined with clinical data, can improve diabetes management and quality of life, and also prevent diabetes-related complications. Moving from a world in which patients are characterized by only a few recent measurements of fasting glucose levels and glycated haemoglobin to a world where patients, healthcare professionals and research scientists can consider various key parameters at thousands of time points simultaneously will profoundly change the way diabetes is prevented, managed and characterized in patients living with diabetes, as well as how it is scientifically researched. Indeed, the present review looks at how the digitization of diabetes can impact all fields of diabetes - its prevention, management, technology and research - and how it can complement, but not replace, what is usually done in traditional clinical settings. Such a profound shift is a genuine game changer that should be embraced by all, as it can provide solid research results transferable to patients, improve general health literacy, and provide tools to facilitate the everyday decision-making process by both healthcare professionals and patients living with diabetes.
Collapse
|
575
|
Saeed A, Ozcelebi T, Lukkien J. Synthesizing and Reconstructing Missing Sensory Modalities in Behavioral Context Recognition. SENSORS (BASEL, SWITZERLAND) 2018; 18:E2967. [PMID: 30200575 PMCID: PMC6165109 DOI: 10.3390/s18092967] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 08/16/2018] [Accepted: 09/03/2018] [Indexed: 11/17/2022]
Abstract
Detection of human activities along with the associated context is of key importance for various application areas, including assisted living and well-being. To predict a user's context in the daily-life situation a system needs to learn from multimodal data that are often imbalanced, and noisy with missing values. The model is likely to encounter missing sensors in real-life conditions as well (such as a user not wearing a smartwatch) and it fails to infer the context if any of the modalities used for training are missing. In this paper, we propose a method based on an adversarial autoencoder for handling missing sensory features and synthesizing realistic samples. We empirically demonstrate the capability of our method in comparison with classical approaches for filling in missing values on a large-scale activity recognition dataset collected in-the-wild. We develop a fully-connected classification network by extending an encoder and systematically evaluate its multi-label classification performance when several modalities are missing. Furthermore, we show class-conditional artificial data generation and its visual and quantitative analysis on context classification task; representing a strong generative power of adversarial autoencoders.
Collapse
Affiliation(s)
- Aaqib Saeed
- Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands.
| | - Tanir Ozcelebi
- Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands.
| | - Johan Lukkien
- Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands.
| |
Collapse
|
576
|
Swift B, Jain L, White C, Chandrasekaran V, Bhandari A, Hughes DA, Jadhav PR. Innovation at the Intersection of Clinical Trials and Real-World Data Science to Advance Patient Care. Clin Transl Sci 2018; 11:450-460. [PMID: 29768712 PMCID: PMC6132367 DOI: 10.1111/cts.12559] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 03/29/2018] [Indexed: 02/01/2023] Open
Abstract
While efficacy and safety data collected from randomized clinical trials are the evidentiary standard for determining market authorization, this alone may no longer be sufficient to address the needs of key stakeholders (regulators, providers, and payers) and guarantee long-term success of pharmaceutical products. There is a heightened interest from stakeholders on understanding the use of real-world evidence (RWE) to substantiate benefit-risk assessment and support the value of a new drug. This review provides an overview of real-world data (RWD) and related advances in the regulatory framework, and discusses their impact on clinical research and development. A framework for linking drug development decisions with the value proposition of the drug, utilizing pharmacokinetic-pharmacodynamic-pharmacoeconomic models, is introduced. The summary presented here is based on the presentations and discussion at the symposium entitled Innovation at the Intersection of Clinical Trials and Real-World Data to Advance Patient Care at the American Society for Clinical Pharmacology and Therapeutics (ASCPT) 2017 Annual Meeting.
Collapse
Affiliation(s)
| | - Lokesh Jain
- Quantitative Pharmacology and PharmacometricsMerck & Co., Inc.RahwayNew JerseyUSA
| | - Craig White
- Harvard PhD program in Health PolicyCambridgeMassachusettsUSA
| | - Vasu Chandrasekaran
- Center for Observational and Real World EvidenceMerck & Co., Inc.BostonMassachusettsUSA
| | - Aman Bhandari
- Center for Observational and Real World EvidenceMerck & Co., Inc.BostonMassachusettsUSA
| | - Dyfrig A. Hughes
- Centre for Health Economics and Medicines EvaluationBangor UniversityBangorGwyneddUK
| | - Pravin R. Jadhav
- Corporate ProjectsResearch & Development (R&D) InnovationOtsuka Pharmaceutical Development and Commercialization (OPDC)PrincetonNew JerseyUSA
| |
Collapse
|
577
|
Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J Biomed Health Inform 2018; 22:1589-1604. [PMID: 29989977 PMCID: PMC6043423 DOI: 10.1109/jbhi.2017.2767063] [Citation(s) in RCA: 463] [Impact Index Per Article: 66.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The past decade has seen an explosion in the amount of digital information stored in electronic health records (EHRs). While primarily designed for archiving patient information and performing administrative healthcare tasks like billing, many researchers have found secondary use of these records for various clinical informatics applications. Over the same period, the machine learning community has seen widespread advances in the field of deep learning. In this review, we survey the current research on applying deep learning to clinical tasks based on EHR data, where we find a variety of deep learning techniques and frameworks being applied to several types of clinical applications including information extraction, representation learning, outcome prediction, phenotyping, and deidentification. We identify several limitations of current research involving topics such as model interpretability, data heterogeneity, and lack of universal benchmarks. We conclude by summarizing the state of the field and identifying avenues of future deep EHR research.
Collapse
|
578
|
Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One 2018; 13:e0202344. [PMID: 30169498 PMCID: PMC6118376 DOI: 10.1371/journal.pone.0202344] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 07/30/2018] [Indexed: 02/07/2023] Open
Abstract
Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning. Using a cohort of 80,000 patients from the CALIBER programme, we compared traditional modelling and machine-learning approaches in EHR. First, we used Cox models and random survival forests with and without imputation on 27 expert-selected, preprocessed variables to predict all-cause mortality. We then used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input. We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values. We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate. This demonstrates that machine-learning approaches applied to raw EHR data can be used to build models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research.
Collapse
|
579
|
Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, Maathuis MH, Moreau Y, Murphy SA, Przytycka TM, Rebhan M, Röst H, Schuppert A, Schwab M, Spang R, Stekhoven D, Sun J, Weber A, Ziemek D, Zupan B. From hype to reality: data science enabling personalized medicine. BMC Med 2018; 16:150. [PMID: 30145981 PMCID: PMC6109989 DOI: 10.1186/s12916-018-1122-7] [Citation(s) in RCA: 205] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/09/2018] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Personalized, precision, P4, or stratified medicine is understood as a medical approach in which patients are stratified based on their disease subtype, risk, prognosis, or treatment response using specialized diagnostic tests. The key idea is to base medical decisions on individual patient characteristics, including molecular and behavioral biomarkers, rather than on population averages. Personalized medicine is deeply connected to and dependent on data science, specifically machine learning (often named Artificial Intelligence in the mainstream media). While during recent years there has been a lot of enthusiasm about the potential of 'big data' and machine learning-based solutions, there exist only few examples that impact current clinical practice. The lack of impact on clinical practice can largely be attributed to insufficient performance of predictive models, difficulties to interpret complex model predictions, and lack of validation via prospective clinical trials that demonstrate a clear benefit compared to the standard of care. In this paper, we review the potential of state-of-the-art data science approaches for personalized medicine, discuss open challenges, and highlight directions that may help to overcome them in the future. CONCLUSIONS There is a need for an interdisciplinary effort, including data scientists, physicians, patient advocates, regulatory agencies, and health insurance organizations. Partially unrealistic expectations and concerns about data science-based solutions need to be better managed. In parallel, computational methods must advance more to provide direct benefit to clinical practice.
Collapse
Affiliation(s)
- Holger Fröhlich
- UCB Biosciences GmbH, Alfred-Nobel-Str. Str. 10, 40789 Monheim, Germany
- University of Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19c, 53115 Bonn, Germany
| | - Rudi Balling
- University of Luxembourg, 6 avenue du Swing, 4367 Belvaux, Luxembourg
| | - Niko Beerenwinkel
- Department of Biosciences and Engineering, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland
| | - Oliver Kohlbacher
- University of Tübingen, WSI/ZBIT, Sand 14, 72076 Tübingen, Germany
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
- Quantitative Biology Center, University of Tübingen, Auf der Morgenstelle 8, 72076 Tübingen, Germany
- Institute for Translational Bioinformatics, University Medical Center Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Santosh Kumar
- Department of Computer Science, University of Memphis, 2222 Dunn Hall, Memphis, TN 38152 USA
| | - Thomas Lengauer
- Max-Planck-Institute for Informatics, 66123 Saarbrücken, Germany
| | - Marloes H. Maathuis
- ETH Zurich, Seminar für Statistik, Rämistrasse 101, 8092 Zurich, Switzerland
| | - Yves Moreau
- University of Leuven, ESAT, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Susan A. Murphy
- Harvard University, Science Center 400 Suite, Oxford Street, Cambridge, MA 02138-2901 USA
| | - Teresa M. Przytycka
- National Center of Biotechnology Information, National Institute of Health, 8600 Rockville Pike, Bethesda, MD 20894-6075 USA
| | - Michael Rebhan
- Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Hannes Röst
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, ON M5S 3E1 Canada
| | - Andreas Schuppert
- RWTH Aachen, Joint Research Center for Computational Biomedicine, Pauwelsstrasse 19, 52074 Aachen, Germany
| | - Matthias Schwab
- Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology, Aucherbachstrasse 112, 70376 Stuttgart, Germany
- University of Tübingen, Departments of Clinical Pharmacology and of Pharmacy and Biochemistry, Tübingen, Germany
| | - Rainer Spang
- University of Regensburg, Institute of Functional Genomics, Am BioPark 9, 93053 Regensburg, Germany
| | - Daniel Stekhoven
- ETH Zurich, NEXUS Personalized Health Technol., Otto-Stern-Weg 7, 8093 Zurich, Switzerland
| | - Jimeng Sun
- Georgia Tech University, 801 Atlantic Drive, Atlanta, GA 30332-0280 USA
| | - Andreas Weber
- Institute for Computer Science, University of Bonn, Endenicher Allee 19a, 53115 Bonn, Germany
| | - Daniel Ziemek
- Pfizer, Worldwide Research and Development, Linkstraße 10, 10785 Berlin, Germany
| | - Blaz Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
580
|
Deep Learning and Radiomics predict complete response after neo-adjuvant chemoradiation for locally advanced rectal cancer. Sci Rep 2018; 8:12611. [PMID: 30135549 PMCID: PMC6105676 DOI: 10.1038/s41598-018-30657-6] [Citation(s) in RCA: 122] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 08/03/2018] [Indexed: 02/07/2023] Open
Abstract
Treatment of locally advanced rectal cancer involves chemoradiation, followed by total mesorectum excision. Complete response after chemoradiation is an accurate surrogate for long-term local control. Predicting complete response from pre-treatment features could represent a major step towards conservative treatment. Patients with a T2-4 N0-1 rectal adenocarcinoma treated between June 2010 and October 2016 with neo-adjuvant chemoradiation from three academic institutions were included. All clinical and treatment data was integrated in our clinical data warehouse, from which we extracted the features. Radiomics features were extracted from the tumor volume from the treatment planning CT Scan. A Deep Neural Network (DNN) was created to predict complete response, as a methodological proof-of-principle. The results were compared to a baseline Linear Regression model using only the TNM stage as a predictor and a second model created with Support Vector Machine on the same features used in the DNN. Ninety-five patients were included in the final analysis. There were 49 males (52%) and 46 females (48%). Median tumour size was 48 mm (15-130). Twenty-two patients (23%) had pathologic complete response after chemoradiation. One thousand six hundred eighty-three radiomics features were extracted. The DNN predicted complete response with an 80% accuracy, which was better than the Linear Regression model (69.5%) and the SVM model (71.58%). Our model correctly predicted complete response after neo-adjuvant rectal chemoradiotherapy in 80% of the patients of this multicenter cohort. Our results may help to identify patients who would benefit from a conservative treatment, rather than a radical resection.
Collapse
|
581
|
Abstract
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
Collapse
Affiliation(s)
- Pierre Baldi
- Department of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA
| |
Collapse
|
582
|
Approaches to Medical Decision-Making Based on Big Clinical Data. JOURNAL OF HEALTHCARE ENGINEERING 2018; 2018:3917659. [PMID: 29973977 PMCID: PMC6008823 DOI: 10.1155/2018/3917659] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 02/14/2018] [Accepted: 04/30/2018] [Indexed: 12/02/2022]
Abstract
The paper discusses different approaches to building a medical decision support system based on big data. The authors sought to abstain from any data reduction and apply universal teaching and big data processing methods independent of disease classification standards. The paper assesses and compares the accuracy of recommendations among three options: case-based reasoning, simple single-layer neural network, and probabilistic neural network. Further, the paper substantiates the assumption regarding the most efficient approach to solving the specified problem.
Collapse
|
583
|
Patient representation learning and interpretable evaluation using clinical notes. J Biomed Inform 2018; 84:103-113. [PMID: 29966746 DOI: 10.1016/j.jbi.2018.06.016] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 06/07/2018] [Accepted: 06/28/2018] [Indexed: 11/22/2022]
Abstract
We have three contributions in this work: 1. We explore the utility of a stacked denoising autoencoder and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. To analyze if these representations are transferable across tasks, we evaluate them in multiple supervised setups to predict patient mortality, primary diagnostic and procedural category, and gender. We compare their performance with sparse representations obtained from a bag-of-words model. We observe that the learned generalized representations significantly outperform the sparse representations when we have few positive instances to learn from, and there is an absence of strong lexical features. 2. We compare the model performance of the feature set constructed from a bag of words to that obtained from medical concepts. In the latter case, concepts represent problems, treatments, and tests. We find that concept identification does not improve the classification performance. 3. We propose novel techniques to facilitate model interpretability. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate feature sensitivity across two networks to identify the most significant input features for different classification tasks when we use these pretrained representations as the supervised input. We successfully extract the most influential features for the pipeline using this technique.
Collapse
|
584
|
Banerjee I, Gensheimer MF, Wood DJ, Henry S, Aggarwal S, Chang DT, Rubin DL. Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) Utilizing Free-Text Clinical Narratives. Sci Rep 2018; 8:10037. [PMID: 29968730 PMCID: PMC6030075 DOI: 10.1038/s41598-018-27946-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 06/12/2018] [Indexed: 02/07/2023] Open
Abstract
We propose a deep learning model - Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) for estimating short-term life expectancy (>3 months) of the patients by analyzing free-text clinical notes in the electronic medical record, while maintaining the temporal visit sequence. In a single framework, we integrated semantic data mapping and neural embedding technique to produce a text processing method that extracts relevant information from heterogeneous types of clinical notes in an unsupervised manner, and we designed a recurrent neural network to model the temporal dependency of the patient visits. The model was trained on a large dataset (10,293 patients) and validated on a separated dataset (1818 patients). Our method achieved an area under the ROC curve (AUC) of 0.89. To provide explain-ability, we developed an interactive graphical tool that may improve physician understanding of the basis for the model's predictions. The high accuracy and explain-ability of the PPES-Met model may enable our model to be used as a decision support tool to personalize metastatic cancer treatment and provide valuable assistance to the physicians.
Collapse
Affiliation(s)
- Imon Banerjee
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | | | - Douglas J Wood
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Solomon Henry
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Sonya Aggarwal
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Daniel T Chang
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Daniel L Rubin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Biomedical Data Science, Radiology, and Medicine (BMIR) Stanford University, Stanford, CA, USA
| |
Collapse
|
585
|
Paige E, Barrett J, Stevens D, Keogh RH, Sweeting MJ, Nazareth I, Petersen I, Wood AM. Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk. Am J Epidemiol 2018; 187:1530-1538. [PMID: 29584812 PMCID: PMC6030927 DOI: 10.1093/aje/kwy018] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 01/24/2018] [Accepted: 01/25/2018] [Indexed: 11/13/2022] Open
Abstract
The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizing historical repeat measures of multiple risk factors recorded in EHRs to systematically identify patients at high risk of future disease. The approach is principally based on a 2-stage dynamic landmark model. The first stage estimates current risk factor values from all available historical repeat risk factor measurements via landmark-age-specific multivariate linear mixed-effects models with correlated random intercepts, which account for sporadically recorded repeat measures, unobserved data, and measurement errors. The second stage predicts future disease risk from a sex-stratified Cox proportional hazards model, with estimated current risk factor values from the first stage. We exemplify these methods by developing and validating a dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol in 41,373 persons from 10 primary-care practices in England and Wales contributing to The Health Improvement Network (1997-2016). Using cross-validation, the model was well-calibrated (Brier score = 0.041, 95% confidence interval: 0.039, 0.042) and had good discrimination (C-index = 0.768, 95% confidence interval: 0.759, 0.777).
Collapse
Affiliation(s)
- Ellie Paige
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- National Centre for Epidemiology and Population Health, Research School of Population, The Australian National University, Canberra, Australia
| | - Jessica Barrett
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - David Stevens
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Michael J Sweeting
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Irwin Nazareth
- Institute of Epidemiology and Health, Research Department of Primary Care and Population Health, Institute of Epidemiology and Health Care, University College London, London, United Kingdom
| | - Irene Petersen
- Institute of Epidemiology and Health, Research Department of Primary Care and Population Health, Institute of Epidemiology and Health Care, University College London, London, United Kingdom
| | - Angela M Wood
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
586
|
Banda JM, Seneviratne M, Hernandez-Boussard T, Shah NH. Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models. Annu Rev Biomed Data Sci 2018; 1:53-68. [PMID: 31218278 PMCID: PMC6583807 DOI: 10.1146/annurev-biodatasci-080917-013315] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.
Collapse
Affiliation(s)
- Juan M Banda
- Stanford Center for Biomedical Informatics Research, Stanford, California 94305, USA
| | - Martin Seneviratne
- Stanford Center for Biomedical Informatics Research, Stanford, California 94305, USA
| | | | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford, California 94305, USA
| |
Collapse
|
587
|
Choy G, Khalilzadeh O, Michalski M, Do S, Samir AE, Pianykh OS, Geis JR, Pandharipande PV, Brink JA, Dreyer KJ. Current Applications and Future Impact of Machine Learning in Radiology. Radiology 2018; 288:318-328. [PMID: 29944078 DOI: 10.1148/radiol.2018171820] [Citation(s) in RCA: 461] [Impact Index Per Article: 65.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Recent advances and future perspectives of machine learning techniques offer promising applications in medical imaging. Machine learning has the potential to improve different steps of the radiology workflow including order scheduling and triage, clinical decision support systems, detection and interpretation of findings, postprocessing and dose estimation, examination quality control, and radiology reporting. In this article, the authors review examples of current applications of machine learning and artificial intelligence techniques in diagnostic radiology. In addition, the future impact and natural extension of these techniques in radiology practice are discussed.
Collapse
Affiliation(s)
- Garry Choy
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Omid Khalilzadeh
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Mark Michalski
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Synho Do
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Anthony E Samir
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Oleg S Pianykh
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - J Raymond Geis
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Pari V Pandharipande
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - James A Brink
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Keith J Dreyer
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| |
Collapse
|
588
|
DelPozo-Banos M, John A, Petkov N, Berridge DM, Southern K, LLoyd K, Jones C, Spencer S, Travieso CM. Using Neural Networks with Routine Health Records to Identify Suicide Risk: Feasibility Study. JMIR Ment Health 2018; 5:e10144. [PMID: 29934287 PMCID: PMC6035342 DOI: 10.2196/10144] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 04/10/2018] [Accepted: 04/29/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Each year, approximately 800,000 people die by suicide worldwide, accounting for 1-2 in every 100 deaths. It is always a tragic event with a huge impact on family, friends, the community and health professionals. Unfortunately, suicide prevention and the development of risk assessment tools have been hindered by the complexity of the underlying mechanisms and the dynamic nature of a person's motivation and intent. Many of those who die by suicide had contact with health services in the preceding year but identifying those most at risk remains a challenge. OBJECTIVE To explore the feasibility of using artificial neural networks with routinely collected electronic health records to support the identification of those at high risk of suicide when in contact with health services. METHODS Using the Secure Anonymised Information Linkage Databank UK, we extracted the data of those who died by suicide between 2001 and 2015 and paired controls. Looking at primary (general practice) and secondary (hospital admissions) electronic health records, we built a binary feature vector coding the presence of risk factors at different times prior to death. Risk factors included: general practice contact and hospital admission; diagnosis of mental health issues; injury and poisoning; substance misuse; maltreatment; sleep disorders; and the prescription of opiates and psychotropics. Basic artificial neural networks were trained to differentiate between the suicide cases and paired controls. We interpreted the output score as the estimated suicide risk. System performance was assessed with 10x10-fold repeated cross-validation, and its behavior was studied by representing the distribution of estimated risk across the cases and controls, and the distribution of factors across estimated risks. RESULTS We extracted a total of 2604 suicide cases and 20 paired controls per case. Our best system attained a mean error rate of 26.78% (SD 1.46; 64.57% of sensitivity and 81.86% of specificity). While the distribution of controls was concentrated around estimated risks < 0.5, cases were almost uniformly distributed between 0 and 1. Prescription of psychotropics, depression and anxiety, and self-harm increased the estimated risk by ~0.4. At least 95% of those presenting these factors were identified as suicide cases. CONCLUSIONS Despite the simplicity of the implemented system, the proposed methodology obtained an accuracy like other published methods based on specialized questionnaire generated data. Most of the errors came from the heterogeneity of patterns shown by suicide cases, some of which were identical to those of the paired controls. Prescription of psychotropics, depression and anxiety, and self-harm were strongly linked with higher estimated risk scores, followed by hospital admission and long-term drug and alcohol misuse. Other risk factors like sleep disorders and maltreatment had more complex effects.
Collapse
Affiliation(s)
| | - Ann John
- Swansea University, Swansea University Medical School, Swansea, United Kingdom
| | - Nicolai Petkov
- Division of Intelligent Systems, Department of Computer Science, Bernoulli Institute of Mathematics, Computer Science and Artificial Intelligence, Faculty of Science and Engineering, University of Groningen, Groningen, Netherlands
| | - Damon Mark Berridge
- Swansea University, Swansea University Medical School, Swansea, United Kingdom
| | - Kate Southern
- Cardiff Adult Self Injury Project, Cardiff, United Kingdom
| | - Keith LLoyd
- Swansea University, Swansea University Medical School, Swansea, United Kingdom
| | - Caroline Jones
- Hillary Rodham Clinton School of Law, Swansea University, Swansea, United Kingdom
| | - Sarah Spencer
- Princess of Wales Hospital, Bridgend, ABMU Health Board, Swansea, United Kingdom
| | - Carlos Manuel Travieso
- Signals and Communications Department, IDeTIC, University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| |
Collapse
|
589
|
Wang T, Qiu RG, Yu M. Predictive Modeling of the Progression of Alzheimer's Disease with Recurrent Neural Networks. Sci Rep 2018; 8:9161. [PMID: 29907747 PMCID: PMC6003986 DOI: 10.1038/s41598-018-27337-w] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 05/21/2018] [Indexed: 12/27/2022] Open
Abstract
The number of service visits of Alzheimer's disease (AD) patients is different from each other and their visit time intervals are non-uniform. Although the literature has revealed many approaches in disease progression modeling, they fail to leverage these time-relevant part of patients' medical records in predicting disease's future status. This paper investigates how to predict the AD progression for a patient's next medical visit through leveraging heterogeneous medical data. Data provided by the National Alzheimer's Coordinating Center includes 5432 patients with probable AD from August 31, 2005 to May 25, 2017. Long short-term memory recurrent neural networks (RNN) are adopted. The approach relies on an enhanced "many-to-one" RNN architecture to support the shift of time steps. Hence, the approach can deal with patients' various numbers of visits and uneven time intervals. The results show that the proposed approach can be utilized to predict patients' AD progressions on their next visits with over 99% accuracy, significantly outperforming classic baseline methods. This study confirms that RNN can effectively solve the AD progression prediction problem by fully leveraging the inherent temporal and medical patterns derived from patients' historical visits. More promisingly, the approach can be customarily applied to other chronic disease progression problems.
Collapse
Affiliation(s)
- Tingyan Wang
- Health Care Services Research Center, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
- Big Data Lab, Division of Engineering and Information Science, The Pennsylvania State University, Malvern, PA, 19355, USA
| | - Robin G Qiu
- Big Data Lab, Division of Engineering and Information Science, The Pennsylvania State University, Malvern, PA, 19355, USA.
| | - Ming Yu
- Health Care Services Research Center, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
590
|
Comparing Deep Learning and Classical Machine Learning Approaches for Predicting Inpatient Violence Incidents from Clinical Text. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8060981] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
591
|
Parimbelli E, Marini S, Sacchi L, Bellazzi R. Patient similarity for precision medicine: A systematic review. J Biomed Inform 2018; 83:87-96. [PMID: 29864490 DOI: 10.1016/j.jbi.2018.06.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 05/16/2018] [Accepted: 06/01/2018] [Indexed: 12/19/2022]
Abstract
Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is currently challenging the broad definitions of guideline-defined patient groups. Precision medicine leverages on genetic, phenotypic, or psychosocial characteristics to provide precise identification of patient subsets for treatment targeting. Defining a patient similarity measure is thus an essential step to allow stratification of patients into clinically-meaningful subgroups. The present review investigates the use of patient similarity as a tool to enable precision medicine. 279 articles were analyzed along four dimensions: data types considered, clinical domains of application, data analysis methods, and translational stage of findings. Cancer-related research employing molecular profiling and standard data analysis techniques such as clustering constitute the majority of the retrieved studies. Chronic and psychiatric diseases follow as the second most represented clinical domains. Interestingly, almost one quarter of the studies analyzed presented a novel methodology, with the most advanced employing data integration strategies and being portable to different clinical domains. Integration of such techniques into decision support systems constitutes and interesting trend for future research.
Collapse
Affiliation(s)
- E Parimbelli
- Telfer School of Management, University of Ottawa, Ottawa, Canada; Interdepartmental Centre for Health Technologies, University of Pavia, Italy.
| | - S Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - L Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - R Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy; RCCS ICS Maugeri, Pavia, Italy
| |
Collapse
|
592
|
Hu Y, Wen G, Ma J, Li D, Wang C, Li H, Huan E. Label-indicator morpheme growth on LSTM for Chinese healthcare question department classification. J Biomed Inform 2018; 82:154-168. [DOI: 10.1016/j.jbi.2018.04.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 02/05/2018] [Accepted: 04/24/2018] [Indexed: 12/15/2022]
|
593
|
Meyer P, Noblet V, Mazzara C, Lallement A. Survey on deep learning for radiotherapy. Comput Biol Med 2018; 98:126-146. [PMID: 29787940 DOI: 10.1016/j.compbiomed.2018.05.018] [Citation(s) in RCA: 162] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 05/15/2018] [Accepted: 05/15/2018] [Indexed: 12/17/2022]
Abstract
More than 50% of cancer patients are treated with radiotherapy, either exclusively or in combination with other methods. The planning and delivery of radiotherapy treatment is a complex process, but can now be greatly facilitated by artificial intelligence technology. Deep learning is the fastest-growing field in artificial intelligence and has been successfully used in recent years in many domains, including medicine. In this article, we first explain the concept of deep learning, addressing it in the broader context of machine learning. The most common network architectures are presented, with a more specific focus on convolutional neural networks. We then present a review of the published works on deep learning methods that can be applied to radiotherapy, which are classified into seven categories related to the patient workflow, and can provide some insights of potential future applications. We have attempted to make this paper accessible to both radiotherapy and deep learning communities, and hope that it will inspire new collaborations between these two communities to develop dedicated radiotherapy applications.
Collapse
Affiliation(s)
- Philippe Meyer
- Department of Medical Physics, Paul Strauss Center, Strasbourg, France.
| | | | | | | |
Collapse
|
594
|
Aris-Brosou S, Kim J, Li L, Liu H. Predicting the Reasons of Customer Complaints: A First Step Toward Anticipating Quality Issues of In Vitro Diagnostics Assays with Machine Learning. JMIR Med Inform 2018; 6:e34. [PMID: 29764796 PMCID: PMC5974458 DOI: 10.2196/medinform.9960] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Revised: 03/27/2018] [Accepted: 03/27/2018] [Indexed: 11/29/2022] Open
Abstract
Background Vendors in the health care industry produce diagnostic systems that, through a secured connection, allow them to monitor performance almost in real time. However, challenges exist in analyzing and interpreting large volumes of noisy quality control (QC) data. As a result, some QC shifts may not be detected early enough by the vendor, but lead a customer to complain. Objective The aim of this study was to hypothesize that a more proactive response could be designed by utilizing the collected QC data more efficiently. Our aim is therefore to help prevent customer complaints by predicting them based on the QC data collected by in vitro diagnostic systems. Methods QC data from five select in vitro diagnostic assays were combined with the corresponding database of customer complaints over a period of 90 days. A subset of these data over the last 45 days was also analyzed to assess how the length of the training period affects predictions. We defined a set of features used to train two classifiers, one based on decision trees and the other based on adaptive boosting, and assessed model performance by cross-validation. Results The cross-validations showed classification error rates close to zero for some assays with adaptive boosting when predicting the potential cause of customer complaints. Performance was improved by shortening the training period when the volume of complaints increased. Denoising filters that reduced the number of categories to predict further improved performance, as their application simplified the prediction problem. Conclusions This novel approach to predicting customer complaints based on QC data may allow the diagnostic industry, the expected end user of our approach, to proactively identify potential product quality issues and fix these before receiving customer complaints. This represents a new step in the direction of using big data toward product quality improvement.
Collapse
Affiliation(s)
| | - James Kim
- Ortho Clinical Diagnostics, Raritan, NJ, United States
| | - Li Li
- Ortho Clinical Diagnostics, Raritan, NJ, United States
| | - Hui Liu
- Ortho Clinical Diagnostics, Raritan, NJ, United States
| |
Collapse
|
595
|
Fraser K, Bruckner DM, Dordick JS. Advancing Predictive Hepatotoxicity at the Intersection of Experimental, in Silico, and Artificial Intelligence Technologies. Chem Res Toxicol 2018; 31:412-430. [PMID: 29722533 DOI: 10.1021/acs.chemrestox.8b00054] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Adverse drug reactions, particularly those that result in drug-induced liver injury (DILI), are a major cause of drug failure in clinical trials and drug withdrawals. Hepatotoxicity-mediated drug attrition occurs despite substantial investments of time and money in developing cellular assays, animal models, and computational models to predict its occurrence in humans. Underperformance in predicting hepatotoxicity associated with drugs and drug candidates has been attributed to existing gaps in our understanding of the mechanisms involved in driving hepatic injury after these compounds perfuse and are metabolized by the liver. Herein we assess in vitro, in vivo (animal), and in silico strategies used to develop predictive DILI models. We address the effectiveness of several two- and three-dimensional in vitro cellular methods that are frequently employed in hepatotoxicity screens and how they can be used to predict DILI in humans. We also explore how humanized animal models can recapitulate human drug metabolic profiles and associated liver injury. Finally, we highlight the maturation of computational methods for predicting hepatotoxicity, the untapped potential of artificial intelligence for improving in silico DILI screens, and how knowledge acquired from these predictions can shape the refinement of experimental methods.
Collapse
Affiliation(s)
- Keith Fraser
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| | - Dylan M Bruckner
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| | - Jonathan S Dordick
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| |
Collapse
|
596
|
Big Data and Data Science in Critical Care. Chest 2018; 154:1239-1248. [PMID: 29752973 DOI: 10.1016/j.chest.2018.04.037] [Citation(s) in RCA: 163] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 04/06/2018] [Accepted: 04/27/2018] [Indexed: 12/22/2022] Open
Abstract
The digitalization of the health-care system has resulted in a deluge of clinical big data and has prompted the rapid growth of data science in medicine. Data science, which is the field of study dedicated to the principled extraction of knowledge from complex data, is particularly relevant in the critical care setting. The availability of large amounts of data in the ICU, the need for better evidence-based care, and the complexity of critical illness makes the use of data science techniques and data-driven research particularly appealing to intensivists. Despite the increasing number of studies and publications in the field, thus far there have been few examples of data science projects that have resulted in successful implementations of data-driven systems in the ICU. However, given the expected growth in the field, intensivists should be familiar with the opportunities and challenges of big data and data science. The present article reviews the definitions, types of algorithms, applications, challenges, and future of big data and data science in critical care.
Collapse
|
597
|
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, Sundberg P, Yee H, Zhang K, Zhang Y, Flores G, Duggan GE, Irvine J, Le Q, Litsch K, Mossin A, Tansuwan J, Wang D, Wexler J, Wilson J, Ludwig D, Volchenboum SL, Chou K, Pearson M, Madabushi S, Shah NH, Butte AJ, Howell MD, Cui C, Corrado GS, Dean J. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018; 1:18. [PMID: 31304302 PMCID: PMC6550175 DOI: 10.1038/s41746-018-0029-1] [Citation(s) in RCA: 1011] [Impact Index Per Article: 144.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 03/14/2018] [Accepted: 03/26/2018] [Indexed: 12/17/2022] Open
Abstract
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.
Collapse
Affiliation(s)
- Alvin Rajkomar
- Google Inc, Mountain View, CA USA
- University of California, San Francisco, San Francisco, CA USA
| | | | - Kai Chen
- Google Inc, Mountain View, CA USA
| | | | | | | | | | | | | | - Mimi Sun
- Google Inc, Mountain View, CA USA
| | | | | | | | - Yi Zhang
- Google Inc, Mountain View, CA USA
| | | | | | | | - Quoc Le
- Google Inc, Mountain View, CA USA
| | | | | | | | - De Wang
- Google Inc, Mountain View, CA USA
| | | | | | - Dana Ludwig
- University of California, San Francisco, San Francisco, CA USA
| | | | | | | | | | | | - Atul J. Butte
- University of California, San Francisco, San Francisco, CA USA
| | | | | | | | | |
Collapse
|
598
|
Kalinin AA, Higgins GA, Reamaroon N, Soroushmehr S, Allyn-Feuer A, Dinov ID, Najarian K, Athey BD. Deep learning in pharmacogenomics: from gene regulation to patient stratification. Pharmacogenomics 2018; 19:629-650. [PMID: 29697304 PMCID: PMC6022084 DOI: 10.2217/pgs-2018-0008] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 03/09/2018] [Indexed: 01/02/2023] Open
Abstract
This Perspective provides examples of current and future applications of deep learning in pharmacogenomics, including: identification of novel regulatory variants located in noncoding domains of the genome and their function as applied to pharmacoepigenomics; patient stratification from medical records; and the mechanistic prediction of drug response, targets and their interactions. Deep learning encapsulates a family of machine learning algorithms that has transformed many important subfields of artificial intelligence over the last decade, and has demonstrated breakthrough performance improvements on a wide range of tasks in biomedicine. We anticipate that in the future, deep learning will be widely used to predict personalized drug response and optimize medication selection and dosing, using knowledge extracted from large and complex molecular, epidemiological, clinical and demographic datasets.
Collapse
Affiliation(s)
- Alexandr A Kalinin
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Statistics Online Computational Resource (SOCR), University of Michigan School of Nursing, Ann Arbor, MI 48109, USA
| | - Gerald A Higgins
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Narathip Reamaroon
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Sayedmohammadreza Soroushmehr
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Ari Allyn-Feuer
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Ivo D Dinov
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Statistics Online Computational Resource (SOCR), University of Michigan School of Nursing, Ann Arbor, MI 48109, USA
- Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 48109, USA
| | - Kayvan Najarian
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Brian D Athey
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 48109, USA
- Department of Internal Medicine, University of Michigan Health System, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
599
|
Ping P, Hermjakob H, Polson JS, Benos PV, Wang W. Biomedical Informatics on the Cloud: A Treasure Hunt for Advancing Cardiovascular Medicine. Circ Res 2018; 122:1290-1301. [PMID: 29700073 PMCID: PMC6192708 DOI: 10.1161/circresaha.117.310967] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In the digital age of cardiovascular medicine, the rate of biomedical discovery can be greatly accelerated by the guidance and resources required to unearth potential collections of knowledge. A unified computational platform leverages metadata to not only provide direction but also empower researchers to mine a wealth of biomedical information and forge novel mechanistic insights. This review takes the opportunity to present an overview of the cloud-based computational environment, including the functional roles of metadata, the architecture schema of indexing and search, and the practical scenarios of machine learning-supported molecular signature extraction. By introducing several established resources and state-of-the-art workflows, we share with our readers a broadly defined informatics framework to phenotype cardiovascular health and disease.
Collapse
Affiliation(s)
- Peipei Ping
- From the NIH BD2K Center of Excellence for Biomedical Computing at UCLA (HeartBD2K), Los Angeles, CA (P.P., H.H., J.S.P., W.W.)
- Department of Physiology (P.P., J.S.P.)
- Department of Medicine (P.P.)
- UCLA School of Medicine, Los Angeles, CA; Department of Computer Science, Scalable Analytics Institute, UCLA School of Engineering, Los Angeles, CA (P.P., W.W.)
| | - Henning Hermjakob
- From the NIH BD2K Center of Excellence for Biomedical Computing at UCLA (HeartBD2K), Los Angeles, CA (P.P., H.H., J.S.P., W.W.)
- Molecular Systems Cluster, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom (H.H.)
| | - Jennifer S Polson
- From the NIH BD2K Center of Excellence for Biomedical Computing at UCLA (HeartBD2K), Los Angeles, CA (P.P., H.H., J.S.P., W.W.)
- Department of Physiology (P.P., J.S.P.)
| | - Panagiotis V Benos
- Departments of Computational & Systems Biology, School of Medicine, University of Pittsburgh, PA (P.V.B.)
- NIH BD2K Center of Excellence for Biomedical Computing at University of Pittsburgh (Center for Causal Discovery), PA (P.V.B.)
| | - Wei Wang
- From the NIH BD2K Center of Excellence for Biomedical Computing at UCLA (HeartBD2K), Los Angeles, CA (P.P., H.H., J.S.P., W.W.)
- UCLA School of Medicine, Los Angeles, CA; Department of Computer Science, Scalable Analytics Institute, UCLA School of Engineering, Los Angeles, CA (P.P., W.W.)
| |
Collapse
|
600
|
Zhao C, Jiang J, Guan Y, Guo X, He B. EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning. Artif Intell Med 2018; 87:49-59. [PMID: 29691122 DOI: 10.1016/j.artmed.2018.03.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 02/28/2018] [Accepted: 03/29/2018] [Indexed: 01/09/2023]
Abstract
OBJECTIVE Electronic medical records (EMRs) contain medical knowledge that can be used for clinical decision support (CDS). Our objective is to develop a general system that can extract and represent knowledge contained in EMRs to support three CDS tasks-test recommendation, initial diagnosis, and treatment plan recommendation-given the condition of a patient. METHODS We extracted four kinds of medical entities from records and constructed an EMR-based medical knowledge network (EMKN), in which nodes are entities and edges reflect their co-occurrence in a record. Three bipartite subgraphs (bigraphs) were extracted from the EMKN, one to support each task. One part of the bigraph was the given condition (e.g., symptoms), and the other was the condition to be inferred (e.g., diseases). Each bigraph was regarded as a Markov random field (MRF) to support the inference. We proposed three graph-based energy functions and three likelihood-based energy functions. Two of these functions are based on knowledge representation learning and can provide distributed representations of medical entities. Two EMR datasets and three metrics were utilized to evaluate the performance. RESULTS As a whole, the evaluation results indicate that the proposed system outperformed the baseline methods. The distributed representation of medical entities does reflect similarity relationships with respect to knowledge level. CONCLUSION Combining EMKN and MRF is an effective approach for general medical knowledge representation and inference. Different tasks, however, require individually designed energy functions.
Collapse
Affiliation(s)
- Chao Zhao
- School of Computer Science and Technology, Harbin, Heilongjiang 150001, China.
| | - Jingchi Jiang
- School of Computer Science and Technology, Harbin, Heilongjiang 150001, China.
| | - Yi Guan
- School of Computer Science and Technology, Harbin, Heilongjiang 150001, China.
| | - Xitong Guo
- School of Management, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
| | - Bin He
- School of Computer Science and Technology, Harbin, Heilongjiang 150001, China.
| |
Collapse
|