551
|
Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, Maathuis MH, Moreau Y, Murphy SA, Przytycka TM, Rebhan M, Röst H, Schuppert A, Schwab M, Spang R, Stekhoven D, Sun J, Weber A, Ziemek D, Zupan B. From hype to reality: data science enabling personalized medicine. BMC Med 2018; 16:150. [PMID: 30145981 PMCID: PMC6109989 DOI: 10.1186/s12916-018-1122-7] [Citation(s) in RCA: 196] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/09/2018] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Personalized, precision, P4, or stratified medicine is understood as a medical approach in which patients are stratified based on their disease subtype, risk, prognosis, or treatment response using specialized diagnostic tests. The key idea is to base medical decisions on individual patient characteristics, including molecular and behavioral biomarkers, rather than on population averages. Personalized medicine is deeply connected to and dependent on data science, specifically machine learning (often named Artificial Intelligence in the mainstream media). While during recent years there has been a lot of enthusiasm about the potential of 'big data' and machine learning-based solutions, there exist only few examples that impact current clinical practice. The lack of impact on clinical practice can largely be attributed to insufficient performance of predictive models, difficulties to interpret complex model predictions, and lack of validation via prospective clinical trials that demonstrate a clear benefit compared to the standard of care. In this paper, we review the potential of state-of-the-art data science approaches for personalized medicine, discuss open challenges, and highlight directions that may help to overcome them in the future. CONCLUSIONS There is a need for an interdisciplinary effort, including data scientists, physicians, patient advocates, regulatory agencies, and health insurance organizations. Partially unrealistic expectations and concerns about data science-based solutions need to be better managed. In parallel, computational methods must advance more to provide direct benefit to clinical practice.
Collapse
Affiliation(s)
- Holger Fröhlich
- UCB Biosciences GmbH, Alfred-Nobel-Str. Str. 10, 40789 Monheim, Germany
- University of Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19c, 53115 Bonn, Germany
| | - Rudi Balling
- University of Luxembourg, 6 avenue du Swing, 4367 Belvaux, Luxembourg
| | - Niko Beerenwinkel
- Department of Biosciences and Engineering, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland
| | - Oliver Kohlbacher
- University of Tübingen, WSI/ZBIT, Sand 14, 72076 Tübingen, Germany
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
- Quantitative Biology Center, University of Tübingen, Auf der Morgenstelle 8, 72076 Tübingen, Germany
- Institute for Translational Bioinformatics, University Medical Center Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Santosh Kumar
- Department of Computer Science, University of Memphis, 2222 Dunn Hall, Memphis, TN 38152 USA
| | - Thomas Lengauer
- Max-Planck-Institute for Informatics, 66123 Saarbrücken, Germany
| | - Marloes H. Maathuis
- ETH Zurich, Seminar für Statistik, Rämistrasse 101, 8092 Zurich, Switzerland
| | - Yves Moreau
- University of Leuven, ESAT, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Susan A. Murphy
- Harvard University, Science Center 400 Suite, Oxford Street, Cambridge, MA 02138-2901 USA
| | - Teresa M. Przytycka
- National Center of Biotechnology Information, National Institute of Health, 8600 Rockville Pike, Bethesda, MD 20894-6075 USA
| | - Michael Rebhan
- Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Hannes Röst
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, ON M5S 3E1 Canada
| | - Andreas Schuppert
- RWTH Aachen, Joint Research Center for Computational Biomedicine, Pauwelsstrasse 19, 52074 Aachen, Germany
| | - Matthias Schwab
- Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology, Aucherbachstrasse 112, 70376 Stuttgart, Germany
- University of Tübingen, Departments of Clinical Pharmacology and of Pharmacy and Biochemistry, Tübingen, Germany
| | - Rainer Spang
- University of Regensburg, Institute of Functional Genomics, Am BioPark 9, 93053 Regensburg, Germany
| | - Daniel Stekhoven
- ETH Zurich, NEXUS Personalized Health Technol., Otto-Stern-Weg 7, 8093 Zurich, Switzerland
| | - Jimeng Sun
- Georgia Tech University, 801 Atlantic Drive, Atlanta, GA 30332-0280 USA
| | - Andreas Weber
- Institute for Computer Science, University of Bonn, Endenicher Allee 19a, 53115 Bonn, Germany
| | - Daniel Ziemek
- Pfizer, Worldwide Research and Development, Linkstraße 10, 10785 Berlin, Germany
| | - Blaz Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
552
|
Deep Learning and Radiomics predict complete response after neo-adjuvant chemoradiation for locally advanced rectal cancer. Sci Rep 2018; 8:12611. [PMID: 30135549 PMCID: PMC6105676 DOI: 10.1038/s41598-018-30657-6] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 08/03/2018] [Indexed: 02/07/2023] Open
Abstract
Treatment of locally advanced rectal cancer involves chemoradiation, followed by total mesorectum excision. Complete response after chemoradiation is an accurate surrogate for long-term local control. Predicting complete response from pre-treatment features could represent a major step towards conservative treatment. Patients with a T2-4 N0-1 rectal adenocarcinoma treated between June 2010 and October 2016 with neo-adjuvant chemoradiation from three academic institutions were included. All clinical and treatment data was integrated in our clinical data warehouse, from which we extracted the features. Radiomics features were extracted from the tumor volume from the treatment planning CT Scan. A Deep Neural Network (DNN) was created to predict complete response, as a methodological proof-of-principle. The results were compared to a baseline Linear Regression model using only the TNM stage as a predictor and a second model created with Support Vector Machine on the same features used in the DNN. Ninety-five patients were included in the final analysis. There were 49 males (52%) and 46 females (48%). Median tumour size was 48 mm (15-130). Twenty-two patients (23%) had pathologic complete response after chemoradiation. One thousand six hundred eighty-three radiomics features were extracted. The DNN predicted complete response with an 80% accuracy, which was better than the Linear Regression model (69.5%) and the SVM model (71.58%). Our model correctly predicted complete response after neo-adjuvant rectal chemoradiotherapy in 80% of the patients of this multicenter cohort. Our results may help to identify patients who would benefit from a conservative treatment, rather than a radical resection.
Collapse
|
553
|
Abstract
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
Collapse
Affiliation(s)
- Pierre Baldi
- Department of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA
| |
Collapse
|
554
|
Approaches to Medical Decision-Making Based on Big Clinical Data. JOURNAL OF HEALTHCARE ENGINEERING 2018; 2018:3917659. [PMID: 29973977 PMCID: PMC6008823 DOI: 10.1155/2018/3917659] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 02/14/2018] [Accepted: 04/30/2018] [Indexed: 12/02/2022]
Abstract
The paper discusses different approaches to building a medical decision support system based on big data. The authors sought to abstain from any data reduction and apply universal teaching and big data processing methods independent of disease classification standards. The paper assesses and compares the accuracy of recommendations among three options: case-based reasoning, simple single-layer neural network, and probabilistic neural network. Further, the paper substantiates the assumption regarding the most efficient approach to solving the specified problem.
Collapse
|
555
|
Patient representation learning and interpretable evaluation using clinical notes. J Biomed Inform 2018; 84:103-113. [PMID: 29966746 DOI: 10.1016/j.jbi.2018.06.016] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 06/07/2018] [Accepted: 06/28/2018] [Indexed: 11/22/2022]
Abstract
We have three contributions in this work: 1. We explore the utility of a stacked denoising autoencoder and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. To analyze if these representations are transferable across tasks, we evaluate them in multiple supervised setups to predict patient mortality, primary diagnostic and procedural category, and gender. We compare their performance with sparse representations obtained from a bag-of-words model. We observe that the learned generalized representations significantly outperform the sparse representations when we have few positive instances to learn from, and there is an absence of strong lexical features. 2. We compare the model performance of the feature set constructed from a bag of words to that obtained from medical concepts. In the latter case, concepts represent problems, treatments, and tests. We find that concept identification does not improve the classification performance. 3. We propose novel techniques to facilitate model interpretability. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate feature sensitivity across two networks to identify the most significant input features for different classification tasks when we use these pretrained representations as the supervised input. We successfully extract the most influential features for the pipeline using this technique.
Collapse
|
556
|
Banerjee I, Gensheimer MF, Wood DJ, Henry S, Aggarwal S, Chang DT, Rubin DL. Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) Utilizing Free-Text Clinical Narratives. Sci Rep 2018; 8:10037. [PMID: 29968730 PMCID: PMC6030075 DOI: 10.1038/s41598-018-27946-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 06/12/2018] [Indexed: 02/07/2023] Open
Abstract
We propose a deep learning model - Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) for estimating short-term life expectancy (>3 months) of the patients by analyzing free-text clinical notes in the electronic medical record, while maintaining the temporal visit sequence. In a single framework, we integrated semantic data mapping and neural embedding technique to produce a text processing method that extracts relevant information from heterogeneous types of clinical notes in an unsupervised manner, and we designed a recurrent neural network to model the temporal dependency of the patient visits. The model was trained on a large dataset (10,293 patients) and validated on a separated dataset (1818 patients). Our method achieved an area under the ROC curve (AUC) of 0.89. To provide explain-ability, we developed an interactive graphical tool that may improve physician understanding of the basis for the model's predictions. The high accuracy and explain-ability of the PPES-Met model may enable our model to be used as a decision support tool to personalize metastatic cancer treatment and provide valuable assistance to the physicians.
Collapse
Affiliation(s)
- Imon Banerjee
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | | | - Douglas J Wood
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Solomon Henry
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Sonya Aggarwal
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Daniel T Chang
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Daniel L Rubin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Biomedical Data Science, Radiology, and Medicine (BMIR) Stanford University, Stanford, CA, USA
| |
Collapse
|
557
|
Paige E, Barrett J, Stevens D, Keogh RH, Sweeting MJ, Nazareth I, Petersen I, Wood AM. Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk. Am J Epidemiol 2018; 187:1530-1538. [PMID: 29584812 PMCID: PMC6030927 DOI: 10.1093/aje/kwy018] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 01/24/2018] [Accepted: 01/25/2018] [Indexed: 11/13/2022] Open
Abstract
The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizing historical repeat measures of multiple risk factors recorded in EHRs to systematically identify patients at high risk of future disease. The approach is principally based on a 2-stage dynamic landmark model. The first stage estimates current risk factor values from all available historical repeat risk factor measurements via landmark-age-specific multivariate linear mixed-effects models with correlated random intercepts, which account for sporadically recorded repeat measures, unobserved data, and measurement errors. The second stage predicts future disease risk from a sex-stratified Cox proportional hazards model, with estimated current risk factor values from the first stage. We exemplify these methods by developing and validating a dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol in 41,373 persons from 10 primary-care practices in England and Wales contributing to The Health Improvement Network (1997-2016). Using cross-validation, the model was well-calibrated (Brier score = 0.041, 95% confidence interval: 0.039, 0.042) and had good discrimination (C-index = 0.768, 95% confidence interval: 0.759, 0.777).
Collapse
Affiliation(s)
- Ellie Paige
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- National Centre for Epidemiology and Population Health, Research School of Population, The Australian National University, Canberra, Australia
| | - Jessica Barrett
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - David Stevens
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Michael J Sweeting
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Irwin Nazareth
- Institute of Epidemiology and Health, Research Department of Primary Care and Population Health, Institute of Epidemiology and Health Care, University College London, London, United Kingdom
| | - Irene Petersen
- Institute of Epidemiology and Health, Research Department of Primary Care and Population Health, Institute of Epidemiology and Health Care, University College London, London, United Kingdom
| | - Angela M Wood
- Department of Public Health and Primary Care, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
558
|
Banda JM, Seneviratne M, Hernandez-Boussard T, Shah NH. Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models. Annu Rev Biomed Data Sci 2018; 1:53-68. [PMID: 31218278 PMCID: PMC6583807 DOI: 10.1146/annurev-biodatasci-080917-013315] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.
Collapse
Affiliation(s)
- Juan M Banda
- Stanford Center for Biomedical Informatics Research, Stanford, California 94305, USA
| | - Martin Seneviratne
- Stanford Center for Biomedical Informatics Research, Stanford, California 94305, USA
| | | | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford, California 94305, USA
| |
Collapse
|
559
|
Choy G, Khalilzadeh O, Michalski M, Do S, Samir AE, Pianykh OS, Geis JR, Pandharipande PV, Brink JA, Dreyer KJ. Current Applications and Future Impact of Machine Learning in Radiology. Radiology 2018; 288:318-328. [PMID: 29944078 DOI: 10.1148/radiol.2018171820] [Citation(s) in RCA: 446] [Impact Index Per Article: 74.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Recent advances and future perspectives of machine learning techniques offer promising applications in medical imaging. Machine learning has the potential to improve different steps of the radiology workflow including order scheduling and triage, clinical decision support systems, detection and interpretation of findings, postprocessing and dose estimation, examination quality control, and radiology reporting. In this article, the authors review examples of current applications of machine learning and artificial intelligence techniques in diagnostic radiology. In addition, the future impact and natural extension of these techniques in radiology practice are discussed.
Collapse
Affiliation(s)
- Garry Choy
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Omid Khalilzadeh
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Mark Michalski
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Synho Do
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Anthony E Samir
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Oleg S Pianykh
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - J Raymond Geis
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Pari V Pandharipande
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - James A Brink
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Keith J Dreyer
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| |
Collapse
|
560
|
DelPozo-Banos M, John A, Petkov N, Berridge DM, Southern K, LLoyd K, Jones C, Spencer S, Travieso CM. Using Neural Networks with Routine Health Records to Identify Suicide Risk: Feasibility Study. JMIR Ment Health 2018; 5:e10144. [PMID: 29934287 PMCID: PMC6035342 DOI: 10.2196/10144] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 04/10/2018] [Accepted: 04/29/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Each year, approximately 800,000 people die by suicide worldwide, accounting for 1-2 in every 100 deaths. It is always a tragic event with a huge impact on family, friends, the community and health professionals. Unfortunately, suicide prevention and the development of risk assessment tools have been hindered by the complexity of the underlying mechanisms and the dynamic nature of a person's motivation and intent. Many of those who die by suicide had contact with health services in the preceding year but identifying those most at risk remains a challenge. OBJECTIVE To explore the feasibility of using artificial neural networks with routinely collected electronic health records to support the identification of those at high risk of suicide when in contact with health services. METHODS Using the Secure Anonymised Information Linkage Databank UK, we extracted the data of those who died by suicide between 2001 and 2015 and paired controls. Looking at primary (general practice) and secondary (hospital admissions) electronic health records, we built a binary feature vector coding the presence of risk factors at different times prior to death. Risk factors included: general practice contact and hospital admission; diagnosis of mental health issues; injury and poisoning; substance misuse; maltreatment; sleep disorders; and the prescription of opiates and psychotropics. Basic artificial neural networks were trained to differentiate between the suicide cases and paired controls. We interpreted the output score as the estimated suicide risk. System performance was assessed with 10x10-fold repeated cross-validation, and its behavior was studied by representing the distribution of estimated risk across the cases and controls, and the distribution of factors across estimated risks. RESULTS We extracted a total of 2604 suicide cases and 20 paired controls per case. Our best system attained a mean error rate of 26.78% (SD 1.46; 64.57% of sensitivity and 81.86% of specificity). While the distribution of controls was concentrated around estimated risks < 0.5, cases were almost uniformly distributed between 0 and 1. Prescription of psychotropics, depression and anxiety, and self-harm increased the estimated risk by ~0.4. At least 95% of those presenting these factors were identified as suicide cases. CONCLUSIONS Despite the simplicity of the implemented system, the proposed methodology obtained an accuracy like other published methods based on specialized questionnaire generated data. Most of the errors came from the heterogeneity of patterns shown by suicide cases, some of which were identical to those of the paired controls. Prescription of psychotropics, depression and anxiety, and self-harm were strongly linked with higher estimated risk scores, followed by hospital admission and long-term drug and alcohol misuse. Other risk factors like sleep disorders and maltreatment had more complex effects.
Collapse
Affiliation(s)
| | - Ann John
- Swansea University, Swansea University Medical School, Swansea, United Kingdom
| | - Nicolai Petkov
- Division of Intelligent Systems, Department of Computer Science, Bernoulli Institute of Mathematics, Computer Science and Artificial Intelligence, Faculty of Science and Engineering, University of Groningen, Groningen, Netherlands
| | - Damon Mark Berridge
- Swansea University, Swansea University Medical School, Swansea, United Kingdom
| | - Kate Southern
- Cardiff Adult Self Injury Project, Cardiff, United Kingdom
| | - Keith LLoyd
- Swansea University, Swansea University Medical School, Swansea, United Kingdom
| | - Caroline Jones
- Hillary Rodham Clinton School of Law, Swansea University, Swansea, United Kingdom
| | - Sarah Spencer
- Princess of Wales Hospital, Bridgend, ABMU Health Board, Swansea, United Kingdom
| | - Carlos Manuel Travieso
- Signals and Communications Department, IDeTIC, University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| |
Collapse
|
561
|
Wang T, Qiu RG, Yu M. Predictive Modeling of the Progression of Alzheimer's Disease with Recurrent Neural Networks. Sci Rep 2018; 8:9161. [PMID: 29907747 PMCID: PMC6003986 DOI: 10.1038/s41598-018-27337-w] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 05/21/2018] [Indexed: 12/27/2022] Open
Abstract
The number of service visits of Alzheimer's disease (AD) patients is different from each other and their visit time intervals are non-uniform. Although the literature has revealed many approaches in disease progression modeling, they fail to leverage these time-relevant part of patients' medical records in predicting disease's future status. This paper investigates how to predict the AD progression for a patient's next medical visit through leveraging heterogeneous medical data. Data provided by the National Alzheimer's Coordinating Center includes 5432 patients with probable AD from August 31, 2005 to May 25, 2017. Long short-term memory recurrent neural networks (RNN) are adopted. The approach relies on an enhanced "many-to-one" RNN architecture to support the shift of time steps. Hence, the approach can deal with patients' various numbers of visits and uneven time intervals. The results show that the proposed approach can be utilized to predict patients' AD progressions on their next visits with over 99% accuracy, significantly outperforming classic baseline methods. This study confirms that RNN can effectively solve the AD progression prediction problem by fully leveraging the inherent temporal and medical patterns derived from patients' historical visits. More promisingly, the approach can be customarily applied to other chronic disease progression problems.
Collapse
Affiliation(s)
- Tingyan Wang
- Health Care Services Research Center, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
- Big Data Lab, Division of Engineering and Information Science, The Pennsylvania State University, Malvern, PA, 19355, USA
| | - Robin G Qiu
- Big Data Lab, Division of Engineering and Information Science, The Pennsylvania State University, Malvern, PA, 19355, USA.
| | - Ming Yu
- Health Care Services Research Center, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
562
|
Comparing Deep Learning and Classical Machine Learning Approaches for Predicting Inpatient Violence Incidents from Clinical Text. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8060981] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
563
|
Parimbelli E, Marini S, Sacchi L, Bellazzi R. Patient similarity for precision medicine: A systematic review. J Biomed Inform 2018; 83:87-96. [PMID: 29864490 DOI: 10.1016/j.jbi.2018.06.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 05/16/2018] [Accepted: 06/01/2018] [Indexed: 12/19/2022]
Abstract
Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is currently challenging the broad definitions of guideline-defined patient groups. Precision medicine leverages on genetic, phenotypic, or psychosocial characteristics to provide precise identification of patient subsets for treatment targeting. Defining a patient similarity measure is thus an essential step to allow stratification of patients into clinically-meaningful subgroups. The present review investigates the use of patient similarity as a tool to enable precision medicine. 279 articles were analyzed along four dimensions: data types considered, clinical domains of application, data analysis methods, and translational stage of findings. Cancer-related research employing molecular profiling and standard data analysis techniques such as clustering constitute the majority of the retrieved studies. Chronic and psychiatric diseases follow as the second most represented clinical domains. Interestingly, almost one quarter of the studies analyzed presented a novel methodology, with the most advanced employing data integration strategies and being portable to different clinical domains. Integration of such techniques into decision support systems constitutes and interesting trend for future research.
Collapse
Affiliation(s)
- E Parimbelli
- Telfer School of Management, University of Ottawa, Ottawa, Canada; Interdepartmental Centre for Health Technologies, University of Pavia, Italy.
| | - S Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - L Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - R Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy; RCCS ICS Maugeri, Pavia, Italy
| |
Collapse
|
564
|
Hu Y, Wen G, Ma J, Li D, Wang C, Li H, Huan E. Label-indicator morpheme growth on LSTM for Chinese healthcare question department classification. J Biomed Inform 2018; 82:154-168. [DOI: 10.1016/j.jbi.2018.04.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 02/05/2018] [Accepted: 04/24/2018] [Indexed: 12/15/2022]
|
565
|
Meyer P, Noblet V, Mazzara C, Lallement A. Survey on deep learning for radiotherapy. Comput Biol Med 2018; 98:126-146. [PMID: 29787940 DOI: 10.1016/j.compbiomed.2018.05.018] [Citation(s) in RCA: 162] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 05/15/2018] [Accepted: 05/15/2018] [Indexed: 12/17/2022]
Abstract
More than 50% of cancer patients are treated with radiotherapy, either exclusively or in combination with other methods. The planning and delivery of radiotherapy treatment is a complex process, but can now be greatly facilitated by artificial intelligence technology. Deep learning is the fastest-growing field in artificial intelligence and has been successfully used in recent years in many domains, including medicine. In this article, we first explain the concept of deep learning, addressing it in the broader context of machine learning. The most common network architectures are presented, with a more specific focus on convolutional neural networks. We then present a review of the published works on deep learning methods that can be applied to radiotherapy, which are classified into seven categories related to the patient workflow, and can provide some insights of potential future applications. We have attempted to make this paper accessible to both radiotherapy and deep learning communities, and hope that it will inspire new collaborations between these two communities to develop dedicated radiotherapy applications.
Collapse
Affiliation(s)
- Philippe Meyer
- Department of Medical Physics, Paul Strauss Center, Strasbourg, France.
| | | | | | | |
Collapse
|
566
|
Aris-Brosou S, Kim J, Li L, Liu H. Predicting the Reasons of Customer Complaints: A First Step Toward Anticipating Quality Issues of In Vitro Diagnostics Assays with Machine Learning. JMIR Med Inform 2018; 6:e34. [PMID: 29764796 PMCID: PMC5974458 DOI: 10.2196/medinform.9960] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Revised: 03/27/2018] [Accepted: 03/27/2018] [Indexed: 11/29/2022] Open
Abstract
Background Vendors in the health care industry produce diagnostic systems that, through a secured connection, allow them to monitor performance almost in real time. However, challenges exist in analyzing and interpreting large volumes of noisy quality control (QC) data. As a result, some QC shifts may not be detected early enough by the vendor, but lead a customer to complain. Objective The aim of this study was to hypothesize that a more proactive response could be designed by utilizing the collected QC data more efficiently. Our aim is therefore to help prevent customer complaints by predicting them based on the QC data collected by in vitro diagnostic systems. Methods QC data from five select in vitro diagnostic assays were combined with the corresponding database of customer complaints over a period of 90 days. A subset of these data over the last 45 days was also analyzed to assess how the length of the training period affects predictions. We defined a set of features used to train two classifiers, one based on decision trees and the other based on adaptive boosting, and assessed model performance by cross-validation. Results The cross-validations showed classification error rates close to zero for some assays with adaptive boosting when predicting the potential cause of customer complaints. Performance was improved by shortening the training period when the volume of complaints increased. Denoising filters that reduced the number of categories to predict further improved performance, as their application simplified the prediction problem. Conclusions This novel approach to predicting customer complaints based on QC data may allow the diagnostic industry, the expected end user of our approach, to proactively identify potential product quality issues and fix these before receiving customer complaints. This represents a new step in the direction of using big data toward product quality improvement.
Collapse
Affiliation(s)
| | - James Kim
- Ortho Clinical Diagnostics, Raritan, NJ, United States
| | - Li Li
- Ortho Clinical Diagnostics, Raritan, NJ, United States
| | - Hui Liu
- Ortho Clinical Diagnostics, Raritan, NJ, United States
| |
Collapse
|
567
|
Fraser K, Bruckner DM, Dordick JS. Advancing Predictive Hepatotoxicity at the Intersection of Experimental, in Silico, and Artificial Intelligence Technologies. Chem Res Toxicol 2018; 31:412-430. [PMID: 29722533 DOI: 10.1021/acs.chemrestox.8b00054] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Adverse drug reactions, particularly those that result in drug-induced liver injury (DILI), are a major cause of drug failure in clinical trials and drug withdrawals. Hepatotoxicity-mediated drug attrition occurs despite substantial investments of time and money in developing cellular assays, animal models, and computational models to predict its occurrence in humans. Underperformance in predicting hepatotoxicity associated with drugs and drug candidates has been attributed to existing gaps in our understanding of the mechanisms involved in driving hepatic injury after these compounds perfuse and are metabolized by the liver. Herein we assess in vitro, in vivo (animal), and in silico strategies used to develop predictive DILI models. We address the effectiveness of several two- and three-dimensional in vitro cellular methods that are frequently employed in hepatotoxicity screens and how they can be used to predict DILI in humans. We also explore how humanized animal models can recapitulate human drug metabolic profiles and associated liver injury. Finally, we highlight the maturation of computational methods for predicting hepatotoxicity, the untapped potential of artificial intelligence for improving in silico DILI screens, and how knowledge acquired from these predictions can shape the refinement of experimental methods.
Collapse
Affiliation(s)
- Keith Fraser
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| | - Dylan M Bruckner
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| | - Jonathan S Dordick
- Department of Chemical and Biological Engineering and Department of Biological Sciences Center for Biotechnology and Interdisciplinary Studies , Rensselaer Polytechnic Institute , Troy , New York 12180 , United States
| |
Collapse
|
568
|
Big Data and Data Science in Critical Care. Chest 2018; 154:1239-1248. [PMID: 29752973 DOI: 10.1016/j.chest.2018.04.037] [Citation(s) in RCA: 152] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 04/06/2018] [Accepted: 04/27/2018] [Indexed: 12/22/2022] Open
Abstract
The digitalization of the health-care system has resulted in a deluge of clinical big data and has prompted the rapid growth of data science in medicine. Data science, which is the field of study dedicated to the principled extraction of knowledge from complex data, is particularly relevant in the critical care setting. The availability of large amounts of data in the ICU, the need for better evidence-based care, and the complexity of critical illness makes the use of data science techniques and data-driven research particularly appealing to intensivists. Despite the increasing number of studies and publications in the field, thus far there have been few examples of data science projects that have resulted in successful implementations of data-driven systems in the ICU. However, given the expected growth in the field, intensivists should be familiar with the opportunities and challenges of big data and data science. The present article reviews the definitions, types of algorithms, applications, challenges, and future of big data and data science in critical care.
Collapse
|
569
|
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, Sundberg P, Yee H, Zhang K, Zhang Y, Flores G, Duggan GE, Irvine J, Le Q, Litsch K, Mossin A, Tansuwan J, Wang D, Wexler J, Wilson J, Ludwig D, Volchenboum SL, Chou K, Pearson M, Madabushi S, Shah NH, Butte AJ, Howell MD, Cui C, Corrado GS, Dean J. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018; 1:18. [PMID: 31304302 PMCID: PMC6550175 DOI: 10.1038/s41746-018-0029-1] [Citation(s) in RCA: 932] [Impact Index Per Article: 155.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 03/14/2018] [Accepted: 03/26/2018] [Indexed: 12/17/2022] Open
Abstract
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.
Collapse
Affiliation(s)
- Alvin Rajkomar
- Google Inc, Mountain View, CA USA
- University of California, San Francisco, San Francisco, CA USA
| | | | - Kai Chen
- Google Inc, Mountain View, CA USA
| | | | | | | | | | | | | | - Mimi Sun
- Google Inc, Mountain View, CA USA
| | | | | | | | - Yi Zhang
- Google Inc, Mountain View, CA USA
| | | | | | | | - Quoc Le
- Google Inc, Mountain View, CA USA
| | | | | | | | - De Wang
- Google Inc, Mountain View, CA USA
| | | | | | - Dana Ludwig
- University of California, San Francisco, San Francisco, CA USA
| | | | | | | | | | | | - Atul J. Butte
- University of California, San Francisco, San Francisco, CA USA
| | | | | | | | | |
Collapse
|
570
|
Kalinin AA, Higgins GA, Reamaroon N, Soroushmehr S, Allyn-Feuer A, Dinov ID, Najarian K, Athey BD. Deep learning in pharmacogenomics: from gene regulation to patient stratification. Pharmacogenomics 2018; 19:629-650. [PMID: 29697304 PMCID: PMC6022084 DOI: 10.2217/pgs-2018-0008] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 03/09/2018] [Indexed: 01/02/2023] Open
Abstract
This Perspective provides examples of current and future applications of deep learning in pharmacogenomics, including: identification of novel regulatory variants located in noncoding domains of the genome and their function as applied to pharmacoepigenomics; patient stratification from medical records; and the mechanistic prediction of drug response, targets and their interactions. Deep learning encapsulates a family of machine learning algorithms that has transformed many important subfields of artificial intelligence over the last decade, and has demonstrated breakthrough performance improvements on a wide range of tasks in biomedicine. We anticipate that in the future, deep learning will be widely used to predict personalized drug response and optimize medication selection and dosing, using knowledge extracted from large and complex molecular, epidemiological, clinical and demographic datasets.
Collapse
Affiliation(s)
- Alexandr A Kalinin
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Statistics Online Computational Resource (SOCR), University of Michigan School of Nursing, Ann Arbor, MI 48109, USA
| | - Gerald A Higgins
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Narathip Reamaroon
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Sayedmohammadreza Soroushmehr
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Ari Allyn-Feuer
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Ivo D Dinov
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Statistics Online Computational Resource (SOCR), University of Michigan School of Nursing, Ann Arbor, MI 48109, USA
- Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 48109, USA
| | - Kayvan Najarian
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Brian D Athey
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 48109, USA
- Department of Internal Medicine, University of Michigan Health System, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
571
|
Ping P, Hermjakob H, Polson JS, Benos PV, Wang W. Biomedical Informatics on the Cloud: A Treasure Hunt for Advancing Cardiovascular Medicine. Circ Res 2018; 122:1290-1301. [PMID: 29700073 PMCID: PMC6192708 DOI: 10.1161/circresaha.117.310967] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In the digital age of cardiovascular medicine, the rate of biomedical discovery can be greatly accelerated by the guidance and resources required to unearth potential collections of knowledge. A unified computational platform leverages metadata to not only provide direction but also empower researchers to mine a wealth of biomedical information and forge novel mechanistic insights. This review takes the opportunity to present an overview of the cloud-based computational environment, including the functional roles of metadata, the architecture schema of indexing and search, and the practical scenarios of machine learning-supported molecular signature extraction. By introducing several established resources and state-of-the-art workflows, we share with our readers a broadly defined informatics framework to phenotype cardiovascular health and disease.
Collapse
Affiliation(s)
- Peipei Ping
- From the NIH BD2K Center of Excellence for Biomedical Computing at UCLA (HeartBD2K), Los Angeles, CA (P.P., H.H., J.S.P., W.W.)
- Department of Physiology (P.P., J.S.P.)
- Department of Medicine (P.P.)
- UCLA School of Medicine, Los Angeles, CA; Department of Computer Science, Scalable Analytics Institute, UCLA School of Engineering, Los Angeles, CA (P.P., W.W.)
| | - Henning Hermjakob
- From the NIH BD2K Center of Excellence for Biomedical Computing at UCLA (HeartBD2K), Los Angeles, CA (P.P., H.H., J.S.P., W.W.)
- Molecular Systems Cluster, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom (H.H.)
| | - Jennifer S Polson
- From the NIH BD2K Center of Excellence for Biomedical Computing at UCLA (HeartBD2K), Los Angeles, CA (P.P., H.H., J.S.P., W.W.)
- Department of Physiology (P.P., J.S.P.)
| | - Panagiotis V Benos
- Departments of Computational & Systems Biology, School of Medicine, University of Pittsburgh, PA (P.V.B.)
- NIH BD2K Center of Excellence for Biomedical Computing at University of Pittsburgh (Center for Causal Discovery), PA (P.V.B.)
| | - Wei Wang
- From the NIH BD2K Center of Excellence for Biomedical Computing at UCLA (HeartBD2K), Los Angeles, CA (P.P., H.H., J.S.P., W.W.)
- UCLA School of Medicine, Los Angeles, CA; Department of Computer Science, Scalable Analytics Institute, UCLA School of Engineering, Los Angeles, CA (P.P., W.W.)
| |
Collapse
|
572
|
Zhao C, Jiang J, Guan Y, Guo X, He B. EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning. Artif Intell Med 2018; 87:49-59. [PMID: 29691122 DOI: 10.1016/j.artmed.2018.03.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 02/28/2018] [Accepted: 03/29/2018] [Indexed: 01/09/2023]
Abstract
OBJECTIVE Electronic medical records (EMRs) contain medical knowledge that can be used for clinical decision support (CDS). Our objective is to develop a general system that can extract and represent knowledge contained in EMRs to support three CDS tasks-test recommendation, initial diagnosis, and treatment plan recommendation-given the condition of a patient. METHODS We extracted four kinds of medical entities from records and constructed an EMR-based medical knowledge network (EMKN), in which nodes are entities and edges reflect their co-occurrence in a record. Three bipartite subgraphs (bigraphs) were extracted from the EMKN, one to support each task. One part of the bigraph was the given condition (e.g., symptoms), and the other was the condition to be inferred (e.g., diseases). Each bigraph was regarded as a Markov random field (MRF) to support the inference. We proposed three graph-based energy functions and three likelihood-based energy functions. Two of these functions are based on knowledge representation learning and can provide distributed representations of medical entities. Two EMR datasets and three metrics were utilized to evaluate the performance. RESULTS As a whole, the evaluation results indicate that the proposed system outperformed the baseline methods. The distributed representation of medical entities does reflect similarity relationships with respect to knowledge level. CONCLUSION Combining EMKN and MRF is an effective approach for general medical knowledge representation and inference. Different tasks, however, require individually designed energy functions.
Collapse
Affiliation(s)
- Chao Zhao
- School of Computer Science and Technology, Harbin, Heilongjiang 150001, China.
| | - Jingchi Jiang
- School of Computer Science and Technology, Harbin, Heilongjiang 150001, China.
| | - Yi Guan
- School of Computer Science and Technology, Harbin, Heilongjiang 150001, China.
| | - Xitong Guo
- School of Management, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
| | - Bin He
- School of Computer Science and Technology, Harbin, Heilongjiang 150001, China.
| |
Collapse
|
573
|
Che Z, St Sauver J, Liu H, Liu Y. Deep Learning Solutions for Classifying Patients on Opioid Use. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:525-534. [PMID: 29854117 PMCID: PMC5977635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Opioid analgesics, as commonly prescribed medications used for relieving pain in patients, are especially prevalent in US these years. However, an increasing amount of opioid misuse and abuse have caused lots of consequences. Researchers and clinicians have attempted to discover the factors leading to opioid long-term use, dependence, and abuse, but only limited incidents are understood from previous works. Motivated by recent successes of deep learning and the abundant amount of electronic health records, we apply state-of-the-art deep and recurrent neural network models on a dataset of more than one hundred thousand opioid users. Our models are shown to achieve robust and superior results on classifying opioid users, and are able to extract key factors for different opioid user groups. This work is also a good demonstration on adopting novel deep learning methods for real-world health care problems.
Collapse
Affiliation(s)
- Zhengping Che
- Department of Computer Science, University of Southern California, Los Angeles, CA
| | - Jennifer St Sauver
- Department of Computer Science, University of Southern California, Los Angeles, CA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Yan Liu
- Department of Computer Science, University of Southern California, Los Angeles, CA
| |
Collapse
|
574
|
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018; 15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 835] [Impact Index Per Article: 139.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open
Abstract
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Collapse
Affiliation(s)
- Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brett K Beaulieu-Jones
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexandr A Kalinin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - Gregory P Way
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
| | | | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Wei Xie
- Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Gail L Rosen
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Benjamin J Lengerich
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Johnny Israeli
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Jack Lanchantin
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Stephen Woloszynek
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Evan M Cofer
- Department of Computer Science, Trinity University, San Antonio, TX, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Srinivas C Turaga
- Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David J Harris
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | | | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yifan Peng
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Laura K Wiley
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Marwin H S Segler
- Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
| | - Simina M Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
| | - Austin Huang
- Department of Medicine, Brown University, Providence, RI, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
575
|
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer. Clin Cancer Res 2018; 24:1248-1259. [PMID: 28982688 PMCID: PMC6050171 DOI: 10.1158/1078-0432.ccr-17-0853] [Citation(s) in RCA: 523] [Impact Index Per Article: 87.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Revised: 06/18/2017] [Accepted: 10/02/2017] [Indexed: 02/07/2023]
Abstract
Identifying robust survival subgroups of hepatocellular carcinoma (HCC) will significantly improve patient care. Currently, endeavor of integrating multi-omics data to explicitly predict HCC survival from multiple patient cohorts is lacking. To fill this gap, we present a deep learning (DL)-based model on HCC that robustly differentiates survival subpopulations of patients in six cohorts. We built the DL-based, survival-sensitive model on 360 HCC patients' data using RNA sequencing (RNA-Seq), miRNA sequencing (miRNA-Seq), and methylation data from The Cancer Genome Atlas (TCGA), which predicts prognosis as good as an alternative model where genomics and clinical data are both considered. This DL-based model provides two optimal subgroups of patients with significant survival differences (P = 7.13e-6) and good model fitness [concordance index (C-index) = 0.68]. More aggressive subtype is associated with frequent TP53 inactivation mutations, higher expression of stemness markers (KRT19 and EPCAM) and tumor marker BIRC5, and activated Wnt and Akt signaling pathways. We validated this multi-omics model on five external datasets of various omics types: LIRI-JP cohort (n = 230, C-index = 0.75), NCI cohort (n = 221, C-index = 0.67), Chinese cohort (n = 166, C-index = 0.69), E-TABM-36 cohort (n = 40, C-index = 0.77), and Hawaiian cohort (n = 27, C-index = 0.82). This is the first study to employ DL to identify multi-omics features linked to the differential survival of patients with HCC. Given its robustness over multiple cohorts, we expect this workflow to be useful at predicting HCC prognosis prediction. Clin Cancer Res; 24(6); 1248-59. ©2017 AACR.
Collapse
Affiliation(s)
| | - Olivier B Poirion
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii
| | - Liangqun Lu
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Lana X Garmire
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii.
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii
| |
Collapse
|
576
|
|
577
|
Hripcsak G, Albers DJ. High-fidelity phenotyping: richness and freedom from bias. J Am Med Inform Assoc 2018; 25:289-294. [PMID: 29040596 PMCID: PMC7282504 DOI: 10.1093/jamia/ocx110] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 08/07/2017] [Accepted: 09/06/2017] [Indexed: 01/14/2023] Open
Abstract
Electronic health record phenotyping is the use of raw electronic health record data to assert characterizations about patients. Researchers have been doing it since the beginning of biomedical informatics, under different names. Phenotyping will benefit from an increasing focus on fidelity, both in the sense of increasing richness, such as measured levels, degree or severity, timing, probability, or conceptual relationships, and in the sense of reducing bias. Research agendas should shift from merely improving binary assignment to studying and improving richer representations. The field is actively researching new temporal directions and abstract representations, including deep learning. The field would benefit from research in nonlinear dynamics, in combining mechanistic models with empirical data, including data assimilation, and in topology. The health care process produces substantial bias, and studying that bias explicitly rather than treating it as merely another source of noise would facilitate addressing it.
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - David J Albers
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
578
|
Hertroijs DFL, Elissen AMJ, Brouwers MCGJ, Schaper NC, Köhler S, Popa MC, Asteriadis S, Hendriks SH, Bilo HJ, Ruwaard D. A risk score including body mass index, glycated haemoglobin and triglycerides predicts future glycaemic control in people with type 2 diabetes. Diabetes Obes Metab 2018; 20:681-688. [PMID: 29095564 PMCID: PMC5836941 DOI: 10.1111/dom.13148] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 10/10/2017] [Accepted: 10/29/2017] [Indexed: 12/12/2022]
Abstract
AIM To identify, predict and validate distinct glycaemic trajectories among patients with newly diagnosed type 2 diabetes treated in primary care, as a first step towards more effective patient-centred care. METHODS We conducted a retrospective study in two cohorts, using routinely collected individual patient data from primary care practices obtained from two large Dutch diabetes patient registries. Participants included adult patients newly diagnosed with type 2 diabetes between January 2006 and December 2014 (development cohort, n = 10 528; validation cohort, n = 3777). Latent growth mixture modelling identified distinct glycaemic 5-year trajectories. Machine learning models were built to predict the trajectories using easily obtainable patient characteristics in daily clinical practice. RESULTS Three different glycaemic trajectories were identified: (1) stable, adequate glycaemic control (76.5% of patients); (2) improved glycaemic control (21.3% of patients); and (3) deteriorated glycaemic control (2.2% of patients). Similar trajectories could be discerned in the validation cohort. Body mass index and glycated haemoglobin and triglyceride levels were the most important predictors of trajectory membership. The predictive model, trained on the development cohort, had a receiver-operating characteristic area under the curve of 0.96 in the validation cohort, indicating excellent accuracy. CONCLUSIONS The developed model can effectively explain heterogeneity in future glycaemic response of patients with type 2 diabetes. It can therefore be used in clinical practice as a quick and easy tool to provide tailored diabetes care.
Collapse
Affiliation(s)
- Dorijn F. L. Hertroijs
- Department of Health Services Research, Care and Public Health Research InstituteFaculty of Health, Medicine and Life Sciences, Maastricht UniversityMaastrichtThe Netherlands
| | - Arianne M. J. Elissen
- Department of Health Services Research, Care and Public Health Research InstituteFaculty of Health, Medicine and Life Sciences, Maastricht UniversityMaastrichtThe Netherlands
| | - Martijn C. G. J. Brouwers
- Department of Internal Medicine, Division of Endocrinology and Metabolic DiseasesMaastricht University Medical CentreMaastrichtThe Netherlands
| | - Nicolaas C. Schaper
- Department of Internal Medicine, Division of Endocrinology and Metabolic DiseasesMaastricht University Medical CentreMaastrichtThe Netherlands
| | - Sebastian Köhler
- Department of Psychiatry and NeuropsychologySchool for Mental Health and Neuroscience, Maastricht UniversityMaastrichtThe Netherlands
| | - Mirela C. Popa
- Department of Data Science and Knowledge Engineering, Faculty of Humanities and SciencesMaastricht UniversityMaastrichtThe Netherlands
| | - Stylianos Asteriadis
- Department of Data Science and Knowledge Engineering, Faculty of Humanities and SciencesMaastricht UniversityMaastrichtThe Netherlands
| | | | - Henk J. Bilo
- Diabetes CentreIsalaZwolleThe Netherlands
- Department of Internal MedicineUniversity Medical Centre Groningen and University of GroningenGroningenThe Netherlands
| | - Dirk Ruwaard
- Department of Health Services Research, Care and Public Health Research InstituteFaculty of Health, Medicine and Life Sciences, Maastricht UniversityMaastrichtThe Netherlands
| |
Collapse
|
579
|
Abstract
Luxia Zhang and colleagues discuss the development of big data in Chinese healthcare and the opportunities for its use in medical research
Collapse
Affiliation(s)
- Luxia Zhang
- Renal Division, Department of Medicine, Peking University First Hospital, Peking University Institute of Nephrology, Beijing, China
- Peking University, Center for Data Science in Health and Medicine, Beijing, China
| | - Haibo Wang
- Clinical Trial Unit, First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- China Standard Medical Information Research Center, Shenzhen, China
| | - Quanzheng Li
- MGH & BWH Center for Clinical Data Science, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Ming-Hui Zhao
- Renal Division, Department of Medicine, Peking University First Hospital, Peking University Institute of Nephrology, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Beijing, China
| | - Qi-Min Zhan
- Peking University, Health Science Center, Beijing, China
| |
Collapse
|
580
|
Dwyer DB, Falkai P, Koutsouleris N. Machine Learning Approaches for Clinical Psychology and Psychiatry. Annu Rev Clin Psychol 2018; 14:91-118. [PMID: 29401044 DOI: 10.1146/annurev-clinpsy-032816-045037] [Citation(s) in RCA: 407] [Impact Index Per Article: 67.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.
Collapse
Affiliation(s)
- Dominic B Dwyer
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany; , ,
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany; , ,
| | - Nikolaos Koutsouleris
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany; , ,
| |
Collapse
|
581
|
Prosser C, Meyer W, Ellis J, Lee R. Evolutionary ARMS Race: Antimalarial Resistance Molecular Surveillance. Trends Parasitol 2018; 34:322-334. [PMID: 29396203 DOI: 10.1016/j.pt.2018.01.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 01/02/2018] [Accepted: 01/03/2018] [Indexed: 01/13/2023]
Abstract
Molecular surveillance of antimalarial drug resistance markers has become an important part of resistance detection and containment. In the current climate of multidrug resistance, including resistance to the global front-line drug artemisinin, there is a consensus to upscale molecular surveillance. The most salient limitation to current surveillance efforts is that skill and infrastructure requirements preclude many regions. This includes sub-Saharan Africa, where Plasmodium falciparum is responsible for most of the global malaria disease burden. New molecular and data technologies have emerged with an emphasis on accessibility. These may allow surveillance to be conducted in broad settings where it is most needed, including at the primary healthcare level in endemic countries, and extending to the village health worker.
Collapse
Affiliation(s)
- Christiane Prosser
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Clinical School-Sydney Medical School, Marie Bashir Institute for Infectious Diseases and Biosecurity, University of Sydney, Sydney, NSW, Australia; Westmead Institute for Medical Research, Westmead, NSW, Australia.
| | - Wieland Meyer
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Clinical School-Sydney Medical School, Marie Bashir Institute for Infectious Diseases and Biosecurity, University of Sydney, Sydney, NSW, Australia; Westmead Institute for Medical Research, Westmead, NSW, Australia
| | - John Ellis
- School of Life Sciences, University of Technology Sydney, NSW, Australia
| | - Rogan Lee
- Centre for Infectious Diseases and Microbiology Laboratory Services, Institute of Clinical Pathology & Medical Research, Westmead Hospital, Westmead, NSW, Australia
| |
Collapse
|
582
|
Cohort Description for MADDEC – Mass Data in Detection and Prevention of Serious Adverse Events in Cardiovascular Disease. IFMBE PROCEEDINGS 2018. [DOI: 10.1007/978-981-10-5122-7_278] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
583
|
DeepDx: A Deep Learning Approach for Predicting the Likelihood and Severity of Symptoms Post Concussion. Brain Inform 2018. [DOI: 10.1007/978-3-030-05587-5_36] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
|
584
|
Abstract
Zusammenfassung. Machine Learning (ML) ist ein aktives Forschungsgebiet in den Informationswissenschaften und hat unseren Alltag in den vergangenen Jahren bereits merklich verändert. Mit fortschreitender Entwicklung sind die neuesten Algorithmen imstande, immer komplexere Aufgaben zu übernehmen. Im vorliegenden Mini-Review beschreiben wir einige Grundlagen des ML und zeigen anhand praxisorienterter Beispiele, wie es in den nächsten Jahren Einzug in die klinische Routine halten könnte.
Collapse
Affiliation(s)
- Anton S Becker
- 1 Institut für Diagnostische und Interventionelle Radiologie, Universitätsspital Zürich
- 2 Departement Gesundheitswissenschaften und Technologie, ETH Zürich
| | - Christian Blüthgen
- 1 Institut für Diagnostische und Interventionelle Radiologie, Universitätsspital Zürich
| | - Urs Mühlematter
- 1 Institut für Diagnostische und Interventionelle Radiologie, Universitätsspital Zürich
| | - Andreas Boss
- 1 Institut für Diagnostische und Interventionelle Radiologie, Universitätsspital Zürich
| |
Collapse
|
585
|
Glicksberg BS, Miotto R, Johnson KW, Shameer K, Li L, Chen R, Dudley JT. Automated disease cohort selection using word embeddings from Electronic Health Records. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:145-156. [PMID: 29218877 PMCID: PMC5788312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Accurate and robust cohort definition is critical to biomedical discovery using Electronic Health Records (EHR). Similar to prospective study designs, high quality EHR-based research requires rigorous selection criteria to designate case/control status particular to each disease. Electronic phenotyping algorithms, which are manually built and validated per disease, have been successful in filling this need. However, these approaches are time-consuming, leading to only a relatively small amount of algorithms for diseases developed. Methodologies that automatically learn features from EHRs have been used for cohort selection as well. To date, however, there has been no systematic analysis of how these methods perform against current gold standards. Accordingly, this paper compares the performance of a state-of-the-art automated feature learning method to extracting research-grade cohorts for five diseases against their established electronic phenotyping algorithms. In particular, we use word2vec to create unsupervised embeddings of the phenotype space within an EHR system. Using medical concepts as a query, we then rank patients by their proximity in the embedding space and automatically extract putative disease cohorts via a distance threshold. Experimental evaluation shows promising results with average F-score of 0.57 and AUC-ROC of 0.98. However, we noticed that results varied considerably between diseases, thus necessitating further investigation and/or phenotype-specific refinement of the approach before being readily deployed across all diseases.
Collapse
Affiliation(s)
- Benjamin S Glicksberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York, NY 10065, USA, ²Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York, NY 10065, USA
| | | | | | | | | | | | | |
Collapse
|
586
|
Abstract
Liver fibrosis is an important pathological precondition for hepatocellular carcinoma. The degree of hepatic fibrosis is positively correlated with liver cancer. Liver fibrosis is a series of pathological and physiological process related to liver cell necrosis and degeneration after chronic liver injury, which finally leads to extracellular matrix and collagen deposition. The early detection and precise staging of fibrosis and cirrhosis are very important for early diagnosis and timely initiation of appropriate therapeutic regimens. The risk of severe liver fibrosis finally progressing to liver carcinoma is >50%. It is known that biopsy is the gold standard for the diagnosis and staging of liver fibrosis. However, this method has some limitations, such as the potential for pain, sampling variability, and low patient acceptance. Furthermore, the necessity of obtaining a tissue diagnosis of liver fibrosis still remains controversial. An increasing number of reliable non-invasive approaches are now available that are widely applied in clinical practice, mostly in cases of viral hepatitis, resulting in a significantly decreased need for liver biopsy. In fact, the non-invasive detection and evaluation of liver cirrhosis now has good accuracy due to current serum markers, ultrasound imaging, and magnetic resonance imaging quantification techniques. A prominent advantage of the non-invasive detection and assessment of liver fibrosis is that liver fibrosis can be monitored repeatedly and easily in the same patient. Serum biomarkers have the advantages of high applicability (>95%) and good reproducibility. However, their results can be influenced by different patient conditions because none of these markers are liver-specific. The most promising techniques appear to be transient elastography and magnetic resonance elastography because they provide reliable results for the detection of fibrosis in the advanced stages, and future developments promise to increase the reliability and accuracy of the staging of hepatic fibrosis. This article aims to describe the recent progress in the development of non-invasive assessment methods for the staging of liver fibrosis, with a special emphasize on computer-aided quantitative and deep learning methods.
Collapse
Affiliation(s)
- Chengxi Li
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Rentao Li
- Department of Hepatobiliary Surgery, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer; Key Laboratory of Cancer Prevention and Therapy, Tianjin; Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Wei Zhang
- Department of Hepatobiliary Surgery, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer; Key Laboratory of Cancer Prevention and Therapy, Tianjin; Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| |
Collapse
|
587
|
Evaluating Mental Health Encounters in mTBI: Identifying Patient Subgroups and Recommending Personalized Treatments. Brain Inform 2018. [DOI: 10.1007/978-3-030-05587-5_35] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
|
588
|
Leveraging uncertainty information from deep neural networks for disease detection. Sci Rep 2017; 7:17816. [PMID: 29259224 PMCID: PMC5736701 DOI: 10.1038/s41598-017-17876-z] [Citation(s) in RCA: 131] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 12/01/2017] [Indexed: 12/19/2022] Open
Abstract
Deep learning (DL) has revolutionized the field of computer vision and image processing. In medical imaging, algorithmic solutions based on DL have been shown to achieve high performance on tasks that previously required medical experts. However, DL-based solutions for disease detection have been proposed without methods to quantify and control their uncertainty in a decision. In contrast, a physician knows whether she is uncertain about a case and will consult more experienced colleagues if needed. Here we evaluate drop-out based Bayesian uncertainty measures for DL in diagnosing diabetic retinopathy (DR) from fundus images and show that it captures uncertainty better than straightforward alternatives. Furthermore, we show that uncertainty informed decision referral can improve diagnostic performance. Experiments across different networks, tasks and datasets show robust generalization. Depending on network capacity and task/dataset difficulty, we surpass 85% sensitivity and 80% specificity as recommended by the NHS when referring 0−20% of the most uncertain decisions for further inspection. We analyse causes of uncertainty by relating intuitions from 2D visualizations to the high-dimensional image space. While uncertainty is sensitive to clinically relevant cases, sensitivity to unfamiliar data samples is task dependent, but can be rendered more robust.
Collapse
|
589
|
Flexible, cluster-based analysis of the electronic medical record of sepsis with composite mixture models. J Biomed Inform 2017; 78:33-42. [PMID: 29196114 DOI: 10.1016/j.jbi.2017.11.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2017] [Revised: 11/01/2017] [Accepted: 11/28/2017] [Indexed: 01/21/2023]
Abstract
The widespread adoption of electronic medical records (EMRs) in healthcare has provided vast new amounts of data for statistical machine learning researchers in their efforts to model and predict patient health status, potentially enabling novel advances in treatment. In the case of sepsis, a debilitating, dysregulated host response to infection, extracting subtle, uncataloged clinical phenotypes from the EMR with statistical machine learning methods has the potential to impact patient diagnosis and treatment early in the course of their hospitalization. However, there are significant barriers that must be overcome to extract these insights from EMR data. First, EMR datasets consist of both static and dynamic observations of discrete and continuous-valued variables, many of which may be missing, precluding the application of standard multivariate analysis techniques. Second, clinical populations observed via EMRs and relevant to the study and management of conditions like sepsis are often heterogeneous; properly accounting for this heterogeneity is critical. Here, we describe an unsupervised, probabilistic framework called a composite mixture model that can simultaneously accommodate the wide variety of observations frequently observed in EMR datasets, characterize heterogeneous clinical populations, and handle missing observations. We demonstrate the efficacy of our approach on a large-scale sepsis cohort, developing novel techniques built on our model-based clusters to track patient mortality risk over time and identify physiological trends and distinct subgroups of the dataset associated with elevated risk of mortality during hospitalization.
Collapse
|
590
|
Wang Z, Li L, Glicksberg BS, Israel A, Dudley JT, Ma'ayan A. Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age. J Biomed Inform 2017; 76:59-68. [PMID: 29113935 PMCID: PMC5716867 DOI: 10.1016/j.jbi.2017.11.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Revised: 10/28/2017] [Accepted: 11/04/2017] [Indexed: 02/08/2023]
Abstract
Determining the discrepancy between chronological and physiological age of patients is central to preventative and personalized care. Electronic medical records (EMR) provide rich information about the patient physiological state, but it is unclear whether such information can be predictive of chronological age. Here we present a deep learning model that uses vital signs and lab tests contained within the EMR of Mount Sinai Health System (MSHS) to predict chronological age. The model is trained on 377,686 EMR from patients of ages 18-85 years old. The discrepancy between the predicted and real chronological age is then used as a proxy to estimate physiological age. Overall, the model can predict the chronological age of patients with a standard deviation error of ∼7 years. The ages of the youngest and oldest patients were more accurately predicted, while patients of ages ranging between 40 and 60 years were the least accurately predicted. Patients with the largest discrepancy between their physiological and chronological age were further inspected. The patients predicted to be significantly older than their chronological age have higher systolic blood pressure, higher cholesterol, damaged liver, and anemia. In contrast, patients predicted to be younger than their chronological age have lower blood pressure and shorter stature among other indicators; both groups display lower weight than the population average. Using information from ∼10,000 patients from the entire cohort who have been also profiled with SNP arrays, genome-wide association study (GWAS) uncovers several novel genetic variants associated with aging. In particular, significant variants were mapped to genes known to be associated with inflammation, hypertension, lipid metabolism, height, and increased lifespan in mice. Several genes with missense mutations were identified as novel candidate aging genes. In conclusion, we demonstrate how EMR data can be used to assess overall health via a scale that is based on deviation from the patient's predicted chronological age.
Collapse
Affiliation(s)
- Zichen Wang
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Li Li
- Department of Genetics and Genomic Sciences, Institute of Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Benjamin S Glicksberg
- Department of Genetics and Genomic Sciences, Institute of Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Ariel Israel
- Department of Family Medicine, Clalit Health Services, Jerusalem 90258, Israel
| | - Joel T Dudley
- Department of Genetics and Genomic Sciences, Institute of Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA.
| |
Collapse
|
591
|
Electronic Health Record Driven Prediction for Gestational Diabetes Mellitus in Early Pregnancy. Sci Rep 2017; 7:16417. [PMID: 29180800 PMCID: PMC5703904 DOI: 10.1038/s41598-017-16665-y] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 11/16/2017] [Indexed: 12/20/2022] Open
Abstract
Gestational diabetes mellitus (GDM) is conventionally confirmed with oral glucose tolerance test (OGTT) in 24 to 28 weeks of gestation, but it is still uncertain whether it can be predicted with secondary use of electronic health records (EHRs) in early pregnancy. To this purpose, the cost-sensitive hybrid model (CSHM) and five conventional machine learning methods are used to construct the predictive models, capturing the future risks of GDM in the temporally aggregated EHRs. The experimental data sources from a nested case-control study cohort, containing 33,935 gestational women in West China Second Hospital. After data cleaning, 4,378 cases and 50 attributes are stored and collected for the data set. Through selecting the most feasible method, the cost parameter of CSHM is adapted to deal with imbalance of the dataset. In the experiment, 3940 samples are used for training and the rest 438 samples for testing. Although the accuracy of positive samples is barely acceptable (62.16%), the results suggest that the vast majority (98.4%) of those predicted positive instances are real positives. To our knowledge, this is the first study to apply machine learning models with EHRs to predict GDM, which will facilitate personalized medicine in maternal health management in the future.
Collapse
|
592
|
Lakhani P, Prater AB, Hutson RK, Andriole KP, Dreyer KJ, Morey J, Prevedello LM, Clark TJ, Geis JR, Itri JN, Hawkins CM. Machine Learning in Radiology: Applications Beyond Image Interpretation. J Am Coll Radiol 2017; 15:350-359. [PMID: 29158061 DOI: 10.1016/j.jacr.2017.09.044] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Revised: 09/21/2017] [Accepted: 09/30/2017] [Indexed: 12/18/2022]
Abstract
Much attention has been given to machine learning and its perceived impact in radiology, particularly in light of recent success with image classification in international competitions. However, machine learning is likely to impact radiology outside of image interpretation long before a fully functional "machine radiologist" is implemented in practice. Here, we describe an overview of machine learning, its application to radiology and other domains, and many cases of use that do not involve image interpretation. We hope that better understanding of these potential applications will help radiology practices prepare for the future and realize performance improvement and efficiency gains.
Collapse
Affiliation(s)
- Paras Lakhani
- Department of Radiology, Thomas Jefferson University Hospital, Sidney Kimmel Jefferson Medical College, Philadelphia, Pennsylvania.
| | - Adam B Prater
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, Georgia
| | - R Kent Hutson
- Radiology Alliance, Colorado Springs, Colorado; Medical Center Radiologists, Virginia Beach, Virginia
| | - Kathy P Andriole
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Keith J Dreyer
- Department of Radiology, Massachusetts General Hospital, Harvard Medical School Boston, Massachusetts
| | - Jose Morey
- I.B.M. Watson Research, Yorktown Heights, New York; Department of Radiology, University of Virginia, Charlottesville, Virginia; Medical Center Radiologists, Virginia Beach, Virginia
| | | | - Toshi J Clark
- University of Colorado Medical Center, Denver, Colorado
| | | | - Jason N Itri
- Department of Radiology, University of Virginia, Charlottesville, Virginia
| | - C Matthew Hawkins
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, Georgia
| |
Collapse
|
593
|
Beckmann E, Peyrou B, Gallay L, Vignaux JJ. [The potential of artificial intelligence in myology: a viewpoint from a non-robot]. Med Sci (Paris) 2017; 33 Hors série n°1:39-45. [PMID: 29139385 DOI: 10.1051/medsci/201733s108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Affiliation(s)
- Eytan Beckmann
- Institut Dauphine d'Ostéopathie, Paris, France. www.osteoparis13.com - Cabinet d'Ostéopathie, 75013 Paris, France
| | | | - Laure Gallay
- Service de Médecine Interne, Hôpital Edouard Herriot, Lyon, France INMG, CNRS UMR 5310-Inserm U1217, Université Lyon 1, France
| | - Jean-Jacques Vignaux
- Institut Dauphine d'Ostéopathie, Paris, France. www.osteoparis13.com - Cabinet d'Ostéopathie, 75013 Paris, France
| |
Collapse
|
594
|
Computational biology: deep learning. Emerg Top Life Sci 2017; 1:257-274. [PMID: 33525807 PMCID: PMC7289034 DOI: 10.1042/etls20160025] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 09/13/2017] [Accepted: 09/18/2017] [Indexed: 02/06/2023]
Abstract
Deep learning is the trendiest tool in a computational biologist's toolbox. This exciting class of methods, based on artificial neural networks, quickly became popular due to its competitive performance in prediction problems. In pioneering early work, applying simple network architectures to abundant data already provided gains over traditional counterparts in functional genomics, image analysis, and medical diagnostics. Now, ideas for constructing and training networks and even off-the-shelf models have been adapted from the rapidly developing machine learning subfield to improve performance in a range of computational biology tasks. Here, we review some of these advances in the last 2 years.
Collapse
|
595
|
Prediction of Adverse Events in Patients Undergoing Major Cardiovascular Procedures. IEEE J Biomed Health Inform 2017; 21:1719-1729. [DOI: 10.1109/jbhi.2017.2675340] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
596
|
Abstract
Radiomics, the high-throughput mining of quantitative image features from standard-of-care medical imaging that enables data to be extracted and applied within clinical-decision support systems to improve diagnostic, prognostic, and predictive accuracy, is gaining importance in cancer research. Radiomic analysis exploits sophisticated image analysis tools and the rapid development and validation of medical imaging data that uses image-based signatures for precision diagnosis and treatment, providing a powerful tool in modern medicine. Herein, we describe the process of radiomics, its pitfalls, challenges, opportunities, and its capacity to improve clinical decision making, emphasizing the utility for patients with cancer. Currently, the field of radiomics lacks standardized evaluation of both the scientific integrity and the clinical relevance of the numerous published radiomics investigations resulting from the rapid growth of this area. Rigorous evaluation criteria and reporting guidelines need to be established in order for radiomics to mature as a discipline. Herein, we provide guidance for investigations to meet this urgent need in the field of radiomics.
Collapse
|
597
|
Ponomariov V, Chirila L, Apipie FM, Abate R, Rusu M, Wu Z, Liehn EA, Bucur I. Artificial Intelligence versus Doctors' Intelligence: A Glance on Machine Learning Benefaction in Electrocardiography. Discoveries (Craiova) 2017; 5:e76. [PMID: 32309594 PMCID: PMC6941587 DOI: 10.15190/d.2017.6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Computational machine learning, especially self-enhancing algorithms, prove remarkable effectiveness in applications, including cardiovascular medicine. This review summarizes and cross-compares the current machine learning algorithms applied to electrocardiogram interpretation. In practice, continuous real-time monitoring of electrocardiograms is still difficult to realize. Furthermore, automated ECG interpretation by implementing specific artificial intelligence algorithms is even more challenging. By collecting large datasets from one individual, computational approaches can assure an efficient personalized treatment strategy, such as a correct prediction on patient-specific disease progression, therapeutic success rate and limitations of certain interventions, thus reducing the hospitalization costs and physicians’ workload. Clearly such aims can be achieved by a perfect symbiosis of a multidisciplinary team involving clinicians, researchers and computer scientists. Summarizing, continuous cross-examination between machine intelligence and human intelligence is a combination of precision, rationale and high-throughput scientific engine integrated into a challenging framework of big data science.
Collapse
Affiliation(s)
- Victor Ponomariov
- Institute for Molecular Cardiovascular Research (IMCAR), RWTH Aachen University, Germany.,Department of Cardiology, Pulmonology, Angiology and Intensive Care, University Hospital, RWTH Aachen University, Germany
| | | | - Florentina-Mihaela Apipie
- Applied Systems srl, Craiova, Romania.,Faculty of Economic and Business Administration, Doctoral School of Economics, University of Craiova, Romania
| | - Raffaele Abate
- ECUORE LTD, London, England.,School of Medicine, University of Catania, Italy
| | - Mihaela Rusu
- Institute for Molecular Cardiovascular Research, University Hospital, RWTH Aachen, Germany.,IZKF, Aachen, RWTH Aachen, Germany
| | - Zhuojun Wu
- Institute for Molecular Cardiovascular Research (IMCAR), RWTH Aachen University, Germany.,Applied Systems srl, Craiova, Romania
| | - Elisa A Liehn
- Institute for Molecular Cardiovascular Research (IMCAR), RWTH Aachen University, Germany.,Department of Cardiology, Pulmonology, Angiology and Intensive Care, University Hospital, RWTH Aachen University, Germany.,Human Genetic Laboratory, University of Medicine and Pharmacy, Craiova, Romania
| | | |
Collapse
|
598
|
Fuller D, Buote R, Stanley K. A glossary for big data in population and public health: discussion and commentary on terminology and research methods. J Epidemiol Community Health 2017; 71:1113-1117. [PMID: 28918390 DOI: 10.1136/jech-2017-209608] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Revised: 08/15/2017] [Accepted: 08/15/2017] [Indexed: 11/03/2022]
Abstract
The volume and velocity of data are growing rapidly and big data analytics are being applied to these data in many fields. Population and public health researchers may be unfamiliar with the terminology and statistical methods used in big data. This creates a barrier to the application of big data analytics. The purpose of this glossary is to define terms used in big data and big data analytics and to contextualise these terms. We define the five Vs of big data and provide definitions and distinctions for data mining, machine learning and deep learning, among other terms. We provide key distinctions between big data and statistical analysis methods applied to big data. We contextualise the glossary by providing examples where big data analysis methods have been applied to population and public health research problems and provide brief guidance on how to learn big data analysis methods.
Collapse
Affiliation(s)
- Daniel Fuller
- School of Human Kinetics and Recreation, Memorial University of Newfoundland, Saint John's, Canada
| | - Richard Buote
- Division of Community Health and Humanities, Faculty of Medicine, Memorial University of Newfoundland, St John's, Canada
| | - Kevin Stanley
- Department of Computer Science, College of Arts and Science, University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
599
|
Scheurwegs E, Cule B, Luyckx K, Luyten L, Daelemans W. Selecting relevant features from the electronic health record for clinical code prediction. J Biomed Inform 2017; 74:92-103. [PMID: 28919106 DOI: 10.1016/j.jbi.2017.09.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Revised: 09/11/2017] [Accepted: 09/12/2017] [Indexed: 11/25/2022]
Abstract
A multitude of information sources is present in the electronic health record (EHR), each of which can contain clues to automatically assign diagnosis and procedure codes. These sources however show information overlap and quality differences, which complicates the retrieval of these clues. Through feature selection, a denser representation with a consistent quality and less information overlap can be obtained. We introduce and compare coverage-based feature selection methods, based on confidence and information gain. These approaches were evaluated over a range of medical specialties, with seven different medical specialties for ICD-9-CM code prediction (six at the Antwerp University Hospital and one in the MIMIC-III dataset) and two different medical specialties for ICD-10-CM code prediction. Using confidence coverage to integrate all sources in an EHR shows a consistent improvement in F-measure (49.83% for diagnosis codes on average), both compared with the baseline (44.25% for diagnosis codes on average) and with using the best standalone source (44.41% for diagnosis codes on average). Confidence coverage creates a concise patient stay representation independent of a rigid framework such as UMLS, and contains easily interpretable features. Confidence coverage has several advantages to a baseline setup. In our baseline setup, feature selection was limited to a filter removing features with less than five total occurrences in the trainingset. Prediction results improved consistently when using multiple heterogeneous sources to predict clinical codes, while reducing the number of features and the processing time.
Collapse
Affiliation(s)
- Elyne Scheurwegs
- University of Antwerp, Advanced Database Research and Modelling Research Group (ADReM), Middelheimlaan 1, B-2020 Antwerp, Belgium; University of Antwerp, Computational Linguistics and Psycholinguistics (CLiPS) Research Center, Lange Winkelstraat 40-42, B-2000 Antwerp, Belgium.
| | - Boris Cule
- University of Antwerp, Advanced Database Research and Modelling Research Group (ADReM), Middelheimlaan 1, B-2020 Antwerp, Belgium
| | - Kim Luyckx
- Antwerp University Hospital, ICT Department, Wilrijkstraat 10, B-2650 Edegem, Belgium
| | - Léon Luyten
- Antwerp University Hospital, Medical Information Department, Wilrijkstraat 10, B-2650 Edegem, Belgium
| | - Walter Daelemans
- University of Antwerp, Computational Linguistics and Psycholinguistics (CLiPS) Research Center, Lange Winkelstraat 40-42, B-2000 Antwerp, Belgium
| |
Collapse
|
600
|
Schlegel DR, Ficheur G. Secondary Use of Patient Data: Review of the Literature Published in 2016. Yearb Med Inform 2017; 26:68-71. [PMID: 29063536 DOI: 10.15265/iy-2017-032] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Objectives: To summarize recent research and emerging trends in the area of secondary use of healthcare data, and to present the best papers published in this field, selected to appear in the 2017 edition of the IMIA Yearbook. Methods: A literature review of articles published in 2016 and related to secondary use of healthcare data was performed using two bibliographic databases. From this search, 941 papers were identified. The section editors independently reviewed the papers for relevancy and impact, resulting in a consensus list of 14 candidate best papers. External reviewers examined each of the candidate best papers and the final selection was made by the editorial board of the Yearbook. Results: From the 941 retrieved papers, the selection process resulted in four best papers. These papers discuss data quality concerns, issues in preserving privacy of patients in shared datasets, and methods of decision support when consuming large amounts of raw electronic health record (EHR) data. Conclusion: In 2016, a significant effort was put into the development of new systems which aim to avoid significant human understanding and pre-processing of healthcare data, though this is still only an emerging area of research. The value of temporal relationships between data received significant study, as did effective information sharing while preserving patient privacy.
Collapse
|