1
|
Miller-Fleming TW, Allos A, Gantz E, Yu D, Isaacs DA, Mathews CA, Scharf JM, Davis LK. Developing a phenotype risk score for tic disorders in a large, clinical biobank. Transl Psychiatry 2024; 14:311. [PMID: 39069519 DOI: 10.1038/s41398-024-03011-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 06/28/2024] [Accepted: 07/04/2024] [Indexed: 07/30/2024] Open
Abstract
Tics are a common feature of early-onset neurodevelopmental disorders, characterized by involuntary and repetitive movements or sounds. Despite affecting up to 2% of children and having a genetic contribution, the underlying causes remain poorly understood. In this study, we leverage dense phenotype information to identify features (i.e., symptoms and comorbid diagnoses) of tic disorders within the context of a clinical biobank. Using de-identified electronic health records (EHRs), we identified individuals with tic disorder diagnosis codes. We performed a phenome-wide association study (PheWAS) to identify the EHR features enriched in tic cases versus controls (n = 1406 and 7030; respectively) and found highly comorbid neuropsychiatric phenotypes, including: obsessive-compulsive disorder, attention-deficit/hyperactivity disorder, autism spectrum disorder, and anxiety (p < 7.396 × 10-5). These features (among others) were then used to generate a phenotype risk score (PheRS) for tic disorder, which was applied across an independent set of 90,051 individuals. A gold standard set of tic disorder cases identified by an EHR algorithm and confirmed by clinician chart review was then used to validate the tic disorder PheRS; the tic disorder PheRS was significantly higher among clinician-validated tic cases versus non-cases (p = 4.787 × 10-151; β = 1.68; SE = 0.06). Our findings provide support for the use of large-scale medical databases to better understand phenotypically complex and underdiagnosed conditions, such as tic disorders.
Collapse
Affiliation(s)
- Tyne W Miller-Fleming
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, Nashville, USA.
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Annmarie Allos
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, Nashville, USA
- Department of Cognitive Science, Dartmouth College, Hanover, NH, USA
| | - Emily Gantz
- Department of Pediatric Neurology, Children's Hospital of Alabama, Birmingham, AL, USA
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, TN, USA
| | - Dongmei Yu
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David A Isaacs
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, TN, USA
| | - Carol A Mathews
- Department of Psychiatry, Genetics Institute, Center for OCD, Anxiety and Related Disorders, University of Florida, Gainesville, FL, USA
| | - Jeremiah M Scharf
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lea K Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, Nashville, USA.
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, Nashville, USA.
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, TN, Nashville, USA.
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, Nashville, USA.
| |
Collapse
|
2
|
Moynihan D, Monaco S, Ting TW, Narasimhalu K, Hsieh J, Kam S, Lim JY, Lim WK, Davila S, Bylstra Y, Balakrishnan ID, Heng M, Chia E, Yeo KK, Goh BK, Gupta R, Tan T, Baynam G, Jamuar SS. Cluster analysis and visualisation of electronic health records data to identify undiagnosed patients with rare genetic diseases. Sci Rep 2024; 14:5056. [PMID: 38424111 PMCID: PMC10904843 DOI: 10.1038/s41598-024-55424-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 02/23/2024] [Indexed: 03/02/2024] Open
Abstract
Rare genetic diseases affect 5-8% of the population but are often undiagnosed or misdiagnosed. Electronic health records (EHR) contain large amounts of data, which provide opportunities for analysing and mining. Data mining, in the form of cluster analysis and visualisation, was performed on a database containing deidentified health records of 1.28 million patients across 3 major hospitals in Singapore, in a bid to improve the diagnostic process for patients who are living with an undiagnosed rare disease, specifically focusing on Fabry Disease and Familial Hypercholesterolaemia (FH). On a baseline of 4 patients, we identified 2 additional patients with potential diagnosis of Fabry disease, suggesting a potential 50% increase in diagnosis. Similarly, we identified > 12,000 individuals who fulfil the clinical and laboratory criteria for FH but had not been diagnosed previously. This proof-of-concept study showed that it is possible to perform mining on EHR data albeit with some challenges and limitations.
Collapse
Affiliation(s)
| | | | - Teck Wah Ting
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore, 229899, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Kaavya Narasimhalu
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- Department of Neurology, National Neuroscience Institute (Singapore General Hospital), Singapore, Singapore
| | - Jenny Hsieh
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- Department of Internal Medicine, Singapore General Hospital, Singapore, Singapore
| | - Sylvia Kam
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore, 229899, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Jiin Ying Lim
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore, 229899, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
- Cancer & Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
- Laboratory of Genome Variation Analytics, Genome Institute of Singapore, Singapore, Singapore
| | - Sonia Davila
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
| | - Yasmin Bylstra
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore
| | - Iswaree Devi Balakrishnan
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
- National Heart Centre Singapore, Singapore, Singapore
| | - Mark Heng
- SingHealth Office of Insights and Analytics, Singapore, Singapore
| | - Elian Chia
- SingHealth Office of Insights and Analytics, Singapore, Singapore
| | | | - Bee Keow Goh
- Data Analytics Office, KK Women's and Children's Hospital, Singapore, Singapore
| | | | - Tele Tan
- Curtin University, Perth, Australia
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, Perth, WA, Australia
- Western Australian Register of Developmental Anomalies, Perth, WA, Australia
| | - Saumya Shekhar Jamuar
- Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore, 229899, Singapore.
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore.
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Singapore.
| |
Collapse
|
3
|
Shahjahan, Dey JK, Dey SK. Translational bioinformatics approach to combat cardiovascular disease and cancers. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2024; 139:221-261. [PMID: 38448136 DOI: 10.1016/bs.apcsb.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Bioinformatics is an interconnected subject of science dealing with diverse fields including biology, chemistry, physics, statistics, mathematics, and computer science as the key fields to answer complicated physiological problems. Key intention of bioinformatics is to store, analyze, organize, and retrieve essential information about genome, proteome, transcriptome, metabolome, as well as organisms to investigate the biological system along with its dynamics, if any. The outcome of bioinformatics depends on the type, quantity, and quality of the raw data provided and the algorithm employed to analyze the same. Despite several approved medicines available, cardiovascular disorders (CVDs) and cancers comprises of the two leading causes of human deaths. Understanding the unknown facts of both these non-communicable disorders is inevitable to discover new pathways, find new drug targets, and eventually newer drugs to combat them successfully. Since, all these goals involve complex investigation and handling of various types of macro- and small- molecules of the human body, bioinformatics plays a key role in such processes. Results from such investigation has direct human application and thus we call this filed as translational bioinformatics. Current book chapter thus deals with diverse scope and applications of this translational bioinformatics to find cure, diagnosis, and understanding the mechanisms of CVDs and cancers. Developing complex yet small or long algorithms to address such problems is very common in translational bioinformatics. Structure-based drug discovery or AI-guided invention of novel antibodies that too with super-high accuracy, speed, and involvement of considerably low amount of investment are some of the astonishing features of the translational bioinformatics and its applications in the fields of CVDs and cancers.
Collapse
Affiliation(s)
- Shahjahan
- Laboratory for Structural Biology of Membrane Proteins, Dr. B.R. Ambedkar Center for Biomedical Research, University of Delhi, Delhi, India
| | - Joy Kumar Dey
- Central Council for Research in Homoeopathy, Ministry of Ayush, Govt. of India, New Delhi, Delhi, India
| | - Sanjay Kumar Dey
- Laboratory for Structural Biology of Membrane Proteins, Dr. B.R. Ambedkar Center for Biomedical Research, University of Delhi, Delhi, India.
| |
Collapse
|
4
|
Getzen E, Tan AL, Brat G, Omenn GS, Strasser Z, Long Q, Holmes JH, Mowery D. Leveraging informative missing data to learn about acute respiratory distress syndrome and mortality in long-term hospitalized COVID-19 patients throughout the years of the pandemic. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:942-950. [PMID: 38222425 PMCID: PMC10785926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Electronic health records (EHRs) contain a wealth of information that can be used to further precision health. One particular data element in EHRs that is not only under-utilized but oftentimes unaccounted for is missing data. However, missingness can provide valuable information about comorbidities and best practices for monitoring patients, which could save lives and reduce burden on the healthcare system. We characterize patterns of missing data in laboratory measurements collected at the University of Pennsylvania Hospital System from long-term COVID-19 patients and focus on the changes in these patterns between 2020 and 2021. We investigate how these patterns are associated with comorbidities such as acute respiratory distress syndrome (ARDS), and 90-day mortality in ARDS patients. This work displays how knowledge and experience can change the way clinicians and hospitals manage a novel disease. It can also provide insight into best practices when it comes to patient monitoring to improve outcomes.
Collapse
Affiliation(s)
- Emily Getzen
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA
| | - Amelia Lm Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Gabriel Brat
- Beth Israel Deaconess Medical Center, Boston, MA
| | | | | | - Qi Long
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA
| | - John H Holmes
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA
| | - Danielle Mowery
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
5
|
Cheung ATM, Kurland DB, Neifert S, Mandelberg N, Nasir-Moin M, Laufer I, Pacione D, Lau D, Frempong-Boadu AK, Kondziolka D, Golfinos JG, Oermann EK. Developing an Automated Registry (Autoregistry) of Spine Surgery Using Natural Language Processing and Health System Scale Databases. Neurosurgery 2023; 93:1228-1234. [PMID: 37345933 DOI: 10.1227/neu.0000000000002568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 04/25/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND AND OBJECTIVES Clinical registries are critical for modern surgery and underpin outcomes research, device monitoring, and trial development. However, existing approaches to registry construction are labor-intensive, costly, and prone to manual error. Natural language processing techniques combined with electronic health record (EHR) data sets can theoretically automate the construction and maintenance of registries. Our aim was to automate the generation of a spine surgery registry at an academic medical center using regular expression (regex) classifiers developed by neurosurgeons to combine domain expertise with interpretable algorithms. METHODS We used a Hadoop data lake consisting of all the information generated by an academic medical center. Using this database and structured query language queries, we retrieved every operative note written in the department of neurosurgery since our transition to EHR. Notes were parsed using regex classifiers and compared with a random subset of 100 manually reviewed notes. RESULTS A total of 31 502 operative cases were downloaded and processed using regex classifiers. The codebase required 5 days of development, 3 weeks of validation, and less than 1 hour for the software to generate the autoregistry. Regex classifiers had an average accuracy of 98.86% at identifying both spinal procedures and the relevant vertebral levels, and it correctly identified the entire list of defined surgical procedures in 89% of patients. We were able to identify patients who required additional operations within 30 days to monitor outcomes and quality metrics. CONCLUSION This study demonstrates the feasibility of automatically generating a spine registry using the EHR and an interpretable, customizable natural language processing algorithm which may reduce pitfalls associated with manual registry development and facilitate rapid clinical research.
Collapse
Affiliation(s)
| | - David B Kurland
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
| | - Sean Neifert
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
| | | | - Mustafa Nasir-Moin
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
| | - Ilya Laufer
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
| | - Donato Pacione
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
| | - Darryl Lau
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
| | | | - Douglas Kondziolka
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
| | - John G Golfinos
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
| | - Eric Karl Oermann
- Department of Neurosurgery, NYU Langone Health, New York , New York , USA
- Department of Radiology, NYU Langone Health, New York , New York , USA
- Center for Data Science, New York University, New York , New York , USA
| |
Collapse
|
6
|
Flint J. The genetic basis of major depressive disorder. Mol Psychiatry 2023; 28:2254-2265. [PMID: 36702864 PMCID: PMC10611584 DOI: 10.1038/s41380-023-01957-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 12/30/2022] [Accepted: 01/11/2023] [Indexed: 01/27/2023]
Abstract
The genetic dissection of major depressive disorder (MDD) ranks as one of the success stories of psychiatric genetics, with genome-wide association studies (GWAS) identifying 178 genetic risk loci and proposing more than 200 candidate genes. However, the GWAS results derive from the analysis of cohorts in which most cases are diagnosed by minimal phenotyping, a method that has low specificity. I review data indicating that there is a large genetic component unique to MDD that remains inaccessible to minimal phenotyping strategies and that the majority of genetic risk loci identified with minimal phenotyping approaches are unlikely to be MDD risk loci. I show that inventive uses of biobank data, novel imputation methods, combined with more interviewer diagnosed cases, can identify loci that contribute to the episodic severe shifts of mood, and neurovegetative and cognitive changes that are central to MDD. Furthermore, new theories about the nature and causes of MDD, drawing upon advances in neuroscience and psychology, can provide handles on how best to interpret and exploit genetic mapping results.
Collapse
Affiliation(s)
- Jonathan Flint
- Department of Psychiatry and Biobehavioral Sciences, Billy and Audrey Wilder Endowed Chair in Psychiatry and Neuroscience, Center for Neurobehavioral Genetics, 695 Charles E. Young Drive South, 3357B Gonda, Box 951761, Los Angeles, CA, 90095-1761, USA.
| |
Collapse
|
7
|
Tan ALM, Getzen EJ, Hutch MR, Strasser ZH, Gutiérrez-Sacristán A, Le TT, Dagliati A, Morris M, Hanauer DA, Moal B, Bonzel CL, Yuan W, Chiudinelli L, Das P, Zhang HG, Aronow BJ, Avillach P, Brat GA, Cai T, Hong C, La Cava WG, Hooi Will Loh H, Luo Y, Murphy SN, Yuan Hgiam K, Omenn GS, Patel LP, Jebathilagam Samayamuthu M, Shriver ER, Shakeri Hossein Abad Z, Tan BWL, Visweswaran S, Wang X, Weber GM, Xia Z, Verdy B, Long Q, Mowery DL, Holmes JH. Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record? J Biomed Inform 2023; 139:104306. [PMID: 36738870 PMCID: PMC10849195 DOI: 10.1016/j.jbi.2023.104306] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 01/21/2023] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
BACKGROUND In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.
Collapse
Affiliation(s)
| | - Emily J Getzen
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | | | - Trang T Le
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | | | | | | | | | | | - Priam Das
- Harvard Medical School, Cambridge, MA, USA
| | | | - Bruce J Aronow
- Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | | | | | - Tianxi Cai
- Harvard Medical School, Cambridge, MA, USA
| | - Chuan Hong
- Harvard Medical School, Cambridge, MA, USA; Duke University, Durham, NC, USA
| | - William G La Cava
- Harvard Medical School, Cambridge, MA, USA; Boston Children's Hospital, Boston, MA, USA
| | | | - Yuan Luo
- Northwestern University, Chicago, IL, USA
| | | | | | | | - Lav P Patel
- University of Kansas Medical Center, United States
| | | | - Emily R Shriver
- University of Pennsylvania Health System, Philadelphia, PA, USA
| | | | | | | | - Xuan Wang
- Harvard Medical School, Cambridge, MA, USA
| | | | - Zongqi Xia
- University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Qi Long
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Danielle L Mowery
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - John H Holmes
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| |
Collapse
|
8
|
Miller-Fleming TW, Allos A, Gantz E, Yu D, Isaacs DA, Mathews CA, Scharf JM, Davis LK. Developing a Phenotype Risk Score for Tic Disorders in a Large, Clinical Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.21.23286253. [PMID: 36865201 PMCID: PMC9980249 DOI: 10.1101/2023.02.21.23286253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Importance Tics are a common feature of early-onset neurodevelopmental disorders, characterized by involuntary and repetitive movements or sounds. Despite affecting up to 2% of young children and having a genetic contribution, the underlying causes remain poorly understood, likely due to the complex phenotypic and genetic heterogeneity among affected individuals. Objective In this study, we leverage dense phenotype information from electronic health records to identify the disease features associated with tic disorders within the context of a clinical biobank. These disease features are then used to generate a phenotype risk score for tic disorder. Design Using de-identified electronic health records from a tertiary care center, we extracted individuals with tic disorder diagnosis codes. We performed a phenome-wide association study to identify the features enriched in tic cases versus controls (N=1,406 and 7,030; respectively). These disease features were then used to generate a phenotype risk score for tic disorder, which was applied across an independent set of 90,051 individuals. A previously curated set of tic disorder cases from an electronic health record algorithm followed by clinician chart review was used to validate the tic disorder phenotype risk score. Main Outcomes and Measures Phenotypic patterns associated with a tic disorder diagnosis in the electronic health record. Results Our tic disorder phenome-wide association study revealed 69 significantly associated phenotypes, predominantly neuropsychiatric conditions, including obsessive compulsive disorder, attention-deficit hyperactivity disorder, autism, and anxiety. The phenotype risk score constructed from these 69 phenotypes in an independent population was significantly higher among clinician-validated tic cases versus non-cases. Conclusions and Relevance Our findings provide support for the use of large-scale medical databases to better understand phenotypically complex diseases, such as tic disorders. The tic disorder phenotype risk score provides a quantitative measure of disease risk that can be leveraged for the assignment of individuals in case-control studies or for additional downstream analyses.
Collapse
Affiliation(s)
- Tyne W. Miller-Fleming
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Annmarie Allos
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, USA
- Department of Cognitive Science, Dartmouth College, Hanover, NH, USA
| | - Emily Gantz
- Department of Pediatric Neurology, Children’s Hospital of Alabama, Birmingham, AL, USA
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Monroe Carell Jr. Children’s Hospital at Vanderbilt, Nashville, TN, USA
| | - Dongmei Yu
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David A. Isaacs
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Monroe Carell Jr. Children’s Hospital at Vanderbilt, Nashville, TN, USA
| | - Carol A. Mathews
- Department of Psychiatry, Genetics Institute, Center for OCD, Anxiety and Related Disorders, University of Florida, Gainesville, FL, USA
| | - Jeremiah M. Scharf
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, TN, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, USA
| |
Collapse
|
9
|
Gao XR, Chiariglione M, Qin K, Nuytemans K, Scharre DW, Li YJ, Martin ER. Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer's disease prediction. Sci Rep 2023; 13:450. [PMID: 36624143 PMCID: PMC9829871 DOI: 10.1038/s41598-023-27551-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Accepted: 01/04/2023] [Indexed: 01/11/2023] Open
Abstract
Alzheimer's disease (AD) is the most common late-onset neurodegenerative disorder. Identifying individuals at increased risk of developing AD is important for early intervention. Using data from the Alzheimer Disease Genetics Consortium, we constructed polygenic risk scores (PRSs) for AD and age-at-onset (AAO) of AD for the UK Biobank participants. We then built machine learning (ML) models for predicting development of AD, and explored feature importance among PRSs, conventional risk factors, and ICD-10 codes from electronic health records, a total of > 11,000 features using the UK Biobank dataset. We used eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP), which provided superior ML performance as well as aided ML model explanation. For participants age 40 and older, the area under the curve for AD was 0.88. For subjects of age 65 and older (late-onset AD), PRSs were the most important predictors. This is the first observation that PRSs constructed from the AD risk and AAO play more important roles than age in predicting AD. The ML model also identified important predictors from EHR, including urinary tract infection, syncope and collapse, chest pain, disorientation and hypercholesterolemia, for developing AD. Our ML model improved the accuracy of AD risk prediction by efficiently exploring numerous predictors and identified novel feature patterns.
Collapse
Affiliation(s)
- Xiaoyi Raymond Gao
- Department of Ophthalmology and Visual Sciences, The Ohio State University, Columbus, OH, USA.
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
- Division of Human Genetics, The Ohio State University, Columbus, OH, USA.
- Ohio State University Physicians Inc., Columbus, OH, USA.
| | - Marion Chiariglione
- Department of Ophthalmology and Visual Sciences, The Ohio State University, Columbus, OH, USA
| | - Ke Qin
- Department of Ophthalmology and Visual Sciences, The Ohio State University, Columbus, OH, USA
| | - Karen Nuytemans
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA
- Dr. John T. MacDonald Foundation Department of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Douglas W Scharre
- Department of Neurology, The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Yi-Ju Li
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
- Duke Molecular Physiology Institute, Durham, NC, USA
| | - Eden R Martin
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA
- Dr. John T. MacDonald Foundation Department of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
10
|
Ahmad S, Ashktorab H, Brim H, Housseau F. Inflammation, microbiome and colorectal cancer disparity in African-Americans: Are there bugs in the genetics? World J Gastroenterol 2022; 28:2782-2801. [PMID: 35978869 PMCID: PMC9280725 DOI: 10.3748/wjg.v28.i25.2782] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 01/27/2022] [Accepted: 05/28/2022] [Indexed: 02/06/2023] Open
Abstract
Dysregulated interactions between host inflammation and gut microbiota over the course of life increase the risk of colorectal cancer (CRC). While environmental factors and socio-economic realities of race remain predominant contributors to CRC disparities in African-Americans (AAs), this review focuses on the biological mediators of CRC disparity, namely the under-appreciated influence of inherited ancestral genetic regulation on mucosal innate immunity and its interaction with the microbiome. There remains a poor understanding of mechanisms linking immune-related genetic polymorphisms and microbiome diversity that could influence chronic inflammation and exacerbate CRC disparities in AAs. A better understanding of the relationship between host genetics, bacteria, and CRC pathogenesis will improve the prediction of cancer risk across race/ethnicity groups overall.
Collapse
Affiliation(s)
- Sami Ahmad
- Department of Oncology, Johns Hopkins University, Baltimore, MD 21231, United States
| | - Hassan Ashktorab
- Department of Medicine, Howard University, Washington, DC 20060, United States
| | - Hassan Brim
- Department of Pathology, Howard University, Washington, DC 20060, United States
| | - Franck Housseau
- Department of Oncology, Johns Hopkins University, Baltimore, MD 21231, United States
| |
Collapse
|
11
|
Koscielniak N, Piatt G, Friedman C, Vinson A, Richesson R, Tucker C. Development of a standards-based phenotype model for gross motor function to support learning health systems in pediatric rehabilitation. Learn Health Syst 2022; 6:e10266. [PMID: 35036550 PMCID: PMC8753308 DOI: 10.1002/lrh2.10266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/19/2021] [Accepted: 03/29/2021] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION Research and continuous quality improvement in pediatric rehabilitation settings require standardized data and a systematic approach to use these data. METHODS We systematically examined pediatric data concepts from a pediatric learning network to determine capacity for capturing gross motor function (GMF) for children with Cerebral Palsy (CP) as a demonstration for enabling infrastructure for research and quality improvement activities of an LHS. We used an iterative approach to construct phenotype models of GMF from standardized data element concepts based on case definitions from the Gross Motor Function Classification System (GMFCS). Data concepts were selected using a theory and expert-informed process and resulted in the construction of four phenotype models of GMF: an overall model and three classes corresponding to deviations in GMF for CP populations. RESULTS Sixty five data element concepts were identified for the overall GMF phenotype model. The 65 data elements correspond to 20 variables and logic statements that instantiate membership into one of three clinically meaningful classes of GMF. Data element concepts and variables are organized into five domains relevant to modeling GMF: Neurologic Function, Mobility Performance, Activity Performance, Motor Performance, and Device Use. CONCLUSION Our experience provides an approach for organizations to leverage existing data for care improvement and research in other conditions. This is the first consensus-based and theory-driven specification of data elements and logic to support identification and labeling of GMF in patients for measuring improvements in care or the impact of new treatments. More research is needed to validate this phenotype model and the extent that these data differentiate between classes of GMF to support various LHS activities.
Collapse
Affiliation(s)
- Nikolas Koscielniak
- Clinical and Translational Science InstituteWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Gretchen Piatt
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Charles Friedman
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Alexandra Vinson
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Rachel Richesson
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Carole Tucker
- Department of Health and Rehabilitation SciencesTemple UniversityPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
12
|
Binkheder S, Asiri MA, Altowayan KW, Alshehri TM, Alzarie MF, Aldekhyyel RN, Almaghlouth IA, Almulhem JA. Real-World Evidence of COVID-19 Patients' Data Quality in the Electronic Health Records. Healthcare (Basel) 2021; 9:1648. [PMID: 34946374 PMCID: PMC8701465 DOI: 10.3390/healthcare9121648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/18/2021] [Accepted: 11/25/2021] [Indexed: 11/19/2022] Open
Abstract
Despite the importance of electronic health records data, less attention has been given to data quality. This study aimed to evaluate the quality of COVID-19 patients' records and their readiness for secondary use. We conducted a retrospective chart review study of all COVID-19 inpatients in an academic healthcare hospital for the year 2020, which were identified using ICD-10 codes and case definition guidelines. COVID-19 signs and symptoms were higher in unstructured clinical notes than in structured coded data. COVID-19 cases were categorized as 218 (66.46%) "confirmed cases", 10 (3.05%) "probable cases", 9 (2.74%) "suspected cases", and 91 (27.74%) "no sufficient evidence". The identification of "probable cases" and "suspected cases" was more challenging than "confirmed cases" where laboratory confirmation was sufficient. The accuracy of the COVID-19 case identification was higher in laboratory tests than in ICD-10 codes. When validating using laboratory results, we found that ICD-10 codes were inaccurately assigned to 238 (72.56%) patients' records. "No sufficient evidence" records might indicate inaccurate and incomplete EHR data. Data quality evaluation should be incorporated to ensure patient safety and data readiness for secondary use research and predictive analytics. We encourage educational and training efforts to motivate healthcare providers regarding the importance of accurate documentation at the point-of-care.
Collapse
Affiliation(s)
- Samar Binkheder
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
| | - Mohammed Ahmed Asiri
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Khaled Waleed Altowayan
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Turki Mohammed Alshehri
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Mashhour Faleh Alzarie
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Raniah N. Aldekhyyel
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
| | - Ibrahim A. Almaghlouth
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Jwaher A. Almulhem
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
| |
Collapse
|
13
|
Jun I, Rich SN, Chen Z, Bian J, Prosperi M. Challenges in replicating secondary analysis of electronic health records data with multiple computable phenotypes: A case study on methicillin-resistant Staphylococcus aureus bacteremia infections. Int J Med Inform 2021; 153:104531. [PMID: 34332468 PMCID: PMC8451470 DOI: 10.1016/j.ijmedinf.2021.104531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 06/03/2021] [Accepted: 06/24/2021] [Indexed: 11/18/2022]
Abstract
BACKGROUND Replication of prediction modeling using electronic health records (EHR) is challenging because of the necessity to compute phenotypes including study cohort, outcomes, and covariates. However, some phenotypes may not be easily replicated across EHR data sources due to a variety of reasons such as the lack of gold standard definitions and documentation variations across systems, which may lead to measurement error and potential bias. Methicillin-resistant Staphylococcus aureus (MRSA) infections are responsible for high mortality worldwide. With limited treatment options for the infection, the ability to predict MRSA outcome is of interest. However, replicating these MRSA outcome prediction models using EHR data is problematic due to the lack of well-defined computable phenotypes for many of the predictors as well as study inclusion and outcome criteria. OBJECTIVE In this study, we aimed to evaluate a prediction model for 30-day mortality after MRSA bacteremia infection diagnosis with reduced vancomycin susceptibility (MRSA-RVS) considering multiple computable phenotypes using EHR data. METHODS We used EHR data from a large academic health center in the United States to replicate the original study conducted in Taiwan. We derived multiple computable phenotypes of risk factors and predictors used in the original study, reported stratified descriptive statistics, and assessed the performance of the prediction model. RESULTS In our replication study, it was possible to (re)compute most of the original variables. Nevertheless, for certain variables, their computable phenotypes can only be approximated by proxy with structured EHR data items, especially the composite clinical indices such as the Pitt bacteremia score. Even computable phenotype for the outcome variable was subject to variation on the basis of the admission/discharge windows. The replicated prediction model exhibited only a mild discriminatory ability. CONCLUSION Despite the rich information in EHR data, replication of prediction models involving complex predictors is still challenging, often due to the limited availability of validated computable phenotypes. On the other hand, it is often possible to derive proxy computable phenotypes that can be further validated and calibrated.
Collapse
Affiliation(s)
- Inyoung Jun
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Shannan N Rich
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Zhaoyi Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Mattia Prosperi
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
14
|
Holmes B, Chitale D, Loving J, Tran M, Subramanian V, Berry A, Rioth M, Warrier R, Brown T. Customizable Natural Language Processing Biomarker Extraction Tool. JCO Clin Cancer Inform 2021; 5:833-841. [PMID: 34406803 DOI: 10.1200/cci.21.00017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Natural language processing (NLP) in pathology reports to extract biomarker information is an ongoing area of research. MetaMap is a natural language processing tool developed and funded by the National Library of Medicine to map biomedical text to the Unified Medical Language System Metathesaurus by applying specific tags to clinically relevant terms. Although results are useful without additional postprocessing, these tags lack important contextual information. METHODS Our novel method takes terminology-driven semantic tags and incorporates those into a semantic frame that is task-specific to add necessary context to MetaMap. We use important contextual information to capture biomarker results to support Community Health System's use of Precision Medicine treatments for patients with cancer. For each biomarker, the name, type, numeric quantifiers, non-numeric qualifiers, and the time frame are extracted. These fields then associate biomarkers with their context in the pathology report such as test type, probe intensity, copy-number changes, and even failed results. A selection of 6,713 relevant reports contained the following standard-of-care biomarkers for metastatic breast cancer: breast cancer gene 1 and 2, estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and programmed death-ligand 1. RESULTS The method was tested on pathology reports from the internal pathology laboratory at Henry Ford Health System. A certified tumor registrar reviewed 400 tests, which showed > 95% accuracy for all extracted biomarker types. CONCLUSION Using this new method, it is possible to extract high-quality, contextual biomarker information, and this represents a significant advance in biomarker extraction.
Collapse
|
15
|
Datta A, Flynn NR, Barnette DA, Woeltje KF, Miller GP, Swamidass SJ. Machine learning liver-injuring drug interactions with non-steroidal anti-inflammatory drugs (NSAIDs) from a retrospective electronic health record (EHR) cohort. PLoS Comput Biol 2021; 17:e1009053. [PMID: 34228716 PMCID: PMC8284671 DOI: 10.1371/journal.pcbi.1009053] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 07/16/2021] [Accepted: 05/08/2021] [Indexed: 01/14/2023] Open
Abstract
Drug-drug interactions account for up to 30% of adverse drug reactions. Increasing prevalence of electronic health records (EHRs) offers a unique opportunity to build machine learning algorithms to identify drug-drug interactions that drive adverse events. In this study, we investigated hospitalizations' data to study drug interactions with non-steroidal anti-inflammatory drugs (NSAIDS) that result in drug-induced liver injury (DILI). We propose a logistic regression based machine learning algorithm that unearths several known interactions from an EHR dataset of about 400,000 hospitalization. Our proposed modeling framework is successful in detecting 87.5% of the positive controls, which are defined by drugs known to interact with diclofenac causing an increased risk of DILI, and correctly ranks aggregate risk of DILI for eight commonly prescribed NSAIDs. We found that our modeling framework is particularly successful in inferring associations of drug-drug interactions from relatively small EHR datasets. Furthermore, we have identified a novel and potentially hepatotoxic interaction that might occur during concomitant use of meloxicam and esomeprazole, which are commonly prescribed together to allay NSAID-induced gastrointestinal (GI) bleeding. Empirically, we validate our approach against prior methods for signal detection on EHR datasets, in which our proposed approach outperforms all the compared methods across most metrics, such as area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC).
Collapse
Affiliation(s)
- Arghya Datta
- Department of Computer Science and Engineering, Washington University in Saint Louis, Saint Louis, Missouri, United States of America
| | - Noah R. Flynn
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, Missouri, United States of America
| | - Dustyn A. Barnette
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Keith F. Woeltje
- Department of Internal Medicine, Washington University School of Medicine, Saint Louis, Missouri, United States of America
- Center for Clinical Excellence at BJC HealthCare, Saint Louis, Missouri, United States of America
| | - Grover P. Miller
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - S. Joshua Swamidass
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
16
|
Moldwin A, Demner-Fushman D, Goodwin TR. Empirical Findings on the Role of Structured Data, Unstructured Data, and their Combination for Automatic Clinical Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:445-454. [PMID: 34457160 PMCID: PMC8378600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The objective of this study is to explore the role of structured and unstructured data for clinical phenotyping by determining which types of clinical phenotypes are best identified using unstructured data (e.g., clinical notes), structured data (e.g., laboratory values, vital signs), or their combination across 172 clinical phenotypes. Specifically, we used laboratory and chart measurements as well as clinical notes from the MIMIC-III critical care database and trained an LSTM using features extracted from each type of data to determine which categories of phenotypes were best identified by structured data, unstructured data, or both. We observed that textual features on their own outperformed structured features for 145 (84%) of phenotypes, and that Doc2Vec was the most effective representation of unstructured data for all phenotypes. When evaluating the impact of adding textual features to systems previously relying only on structured features, we found a statistically significant (p < 0.05) increase in phenotyping performance for 51 phenotypes (primarily involving the circulatory system, injury, and poisoning), one phenotype for which textual features degraded performance (diabetes without complications), and no statistically significant change in performance with the remaining 120 phenotypes. We provide analysis on which phenotypes are best identified by each type of data and guidance on which data sources to consider for future research on phenotype identification.
Collapse
Affiliation(s)
- Asher Moldwin
- U.S. National Library of Medicine, Bethesda, MD, USA
| | | | | |
Collapse
|
17
|
Joshi A, Rienks M, Theofilatos K, Mayr M. Systems biology in cardiovascular disease: a multiomics approach. Nat Rev Cardiol 2021; 18:313-330. [PMID: 33340009 DOI: 10.1038/s41569-020-00477-1] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/02/2020] [Indexed: 12/13/2022]
Abstract
Omics techniques generate large, multidimensional data that are amenable to analysis by new informatics approaches alongside conventional statistical methods. Systems theories, including network analysis and machine learning, are well placed for analysing these data but must be applied with an understanding of the relevant biological and computational theories. Through applying these techniques to omics data, systems biology addresses the problems posed by the complex organization of biological processes. In this Review, we describe the techniques and sources of omics data, outline network theory, and highlight exemplars of novel approaches that combine gene regulatory and co-expression networks, proteomics, metabolomics, lipidomics and phenomics with informatics techniques to provide new insights into cardiovascular disease. The use of systems approaches will become necessary to integrate data from more than one omic technique. Although understanding the interactions between different omics data requires increasingly complex concepts and methods, we argue that hypothesis-driven investigations and independent validation must still accompany these novel systems biology approaches to realize their full potential.
Collapse
Affiliation(s)
- Abhishek Joshi
- King's British Heart Foundation Centre, King's College London, London, UK
- Bart's Heart Centre, St. Bartholomew's Hospital, London, UK
| | - Marieke Rienks
- King's British Heart Foundation Centre, King's College London, London, UK
| | | | - Manuel Mayr
- King's British Heart Foundation Centre, King's College London, London, UK.
| |
Collapse
|
18
|
Zhao Y, Fu S, Bielinski SJ, Decker PA, Chamberlain AM, Roger VL, Liu H, Larson NB. Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation. J Med Internet Res 2021; 23:e22951. [PMID: 33683212 PMCID: PMC7985804 DOI: 10.2196/22951] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/25/2020] [Accepted: 01/20/2021] [Indexed: 11/29/2022] Open
Abstract
Background Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. Objective The aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. Methods The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). Results Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. Conclusions We developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions.
Collapse
Affiliation(s)
- Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Suzette J Bielinski
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Paul A Decker
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Alanna M Chamberlain
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Veronique L Roger
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Nicholas B Larson
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
19
|
Nguyen T, Zhang T, Fox G, Zeng S, Cao N, Pan C, Chen JY. Linking clinotypes to phenotypes and genotypes from laboratory test results in comprehensive physical exams. BMC Med Inform Decis Mak 2021; 21:51. [PMID: 33627109 PMCID: PMC7903607 DOI: 10.1186/s12911-021-01387-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In this work, we aimed to demonstrate how to utilize the lab test results and other clinical information to support precision medicine research and clinical decisions on complex diseases, with the support of electronic medical record facilities. We defined "clinotypes" as clinical information that could be observed and measured objectively using biomedical instruments. From well-known 'omic' problem definitions, we defined problems using clinotype information, including stratifying patients-identifying interested sub cohorts for future studies, mining significant associations between clinotypes and specific phenotypes-diseases, and discovering potential linkages between clinotype and genomic information. We solved these problems by integrating public omic databases and applying advanced machine learning and visual analytic techniques on two-year health exam records from a large population of healthy southern Chinese individuals (size n = 91,354). When developing the solution, we carefully addressed the missing information, imbalance and non-uniformed data annotation issues. RESULTS We organized the techniques and solutions to address the problems and issues above into CPA framework (Clinotype Prediction and Association-finding). At the data preprocessing step, we handled the missing value issue with predicted accuracy of 0.760. We curated 12,635 clinotype-gene associations. We found 147 Associations between 147 chronic diseases-phenotype and clinotypes, which improved the disease predictive performance to AUC (average) of 0.967. We mined 182 significant clinotype-clinotype associations among 69 clinotypes. CONCLUSIONS Our results showed strong potential connectivity between the omics information and the clinical lab test information. The results further emphasized the needs to utilize and integrate the clinical information, especially the lab test results, in future PheWas and omic studies. Furthermore, it showed that the clinotype information could initiate an alternative research direction and serve as an independent field of data to support the well-known 'phenome' and 'genome' researches.
Collapse
Affiliation(s)
- Thanh Nguyen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, AL, Birmingham, USA
| | - Tongbin Zhang
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
- Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Zhejiang, China
| | - Geoffrey Fox
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Sisi Zeng
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
| | - Ni Cao
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
| | - Chuandi Pan
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
- Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Zhejiang, China
| | - Jake Y Chen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, AL, Birmingham, USA.
| |
Collapse
|
20
|
Borrell LN, Elhawary JR, Fuentes-Afflick E, Witonsky J, Bhakta N, Wu AHB, Bibbins-Domingo K, Rodríguez-Santana JR, Lenoir MA, Gavin JR, Kittles RA, Zaitlen NA, Wilkes DS, Powe NR, Ziv E, Burchard EG. Race and Genetic Ancestry in Medicine - A Time for Reckoning with Racism. N Engl J Med 2021; 384:474-480. [PMID: 33406325 PMCID: PMC8979367 DOI: 10.1056/nejmms2029562] [Citation(s) in RCA: 367] [Impact Index Per Article: 122.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Luisa N Borrell
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Jennifer R Elhawary
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Elena Fuentes-Afflick
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Jonathan Witonsky
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Nirav Bhakta
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Alan H B Wu
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Kirsten Bibbins-Domingo
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - José R Rodríguez-Santana
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Michael A Lenoir
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - James R Gavin
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Rick A Kittles
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Noah A Zaitlen
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - David S Wilkes
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Neil R Powe
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Elad Ziv
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Esteban G Burchard
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| |
Collapse
|
21
|
Borrell LN, Elhawary JR, Fuentes-Afflick E, Witonsky J, Bhakta N, Wu AHB, Bibbins-Domingo K, Rodríguez-Santana JR, Lenoir MA, Gavin JR, Kittles RA, Zaitlen NA, Wilkes DS, Powe NR, Ziv E, Burchard EG. Race and Genetic Ancestry in Medicine - A Time for Reckoning with Racism. N Engl J Med 2021. [PMID: 33406325 DOI: 10.1056/negmms2029562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Affiliation(s)
- Luisa N Borrell
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Jennifer R Elhawary
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Elena Fuentes-Afflick
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Jonathan Witonsky
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Nirav Bhakta
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Alan H B Wu
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Kirsten Bibbins-Domingo
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - José R Rodríguez-Santana
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Michael A Lenoir
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - James R Gavin
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Rick A Kittles
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Noah A Zaitlen
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - David S Wilkes
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Neil R Powe
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Elad Ziv
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| | - Esteban G Burchard
- From the Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York (L.N.B.); the Departments of Medicine (J.R.E., J.W., N.B., N.R.P., E.Z., E.G.B.), Pediatrics (E.F.-A., J.W.), Laboratory Medicine (A.H.B.W.), and Epidemiology and Biostatistics (K.B.-D.), Priscilla Chan and Mark Zuckerberg San Francisco General Hospital (K.B.-D., N.R.P.), the Division of General Internal Medicine and the Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center (E.Z.), and the Department of Bioengineering and Therapeutic Sciences (E.G.B.), University of California, San Francisco, San Francisco, Bay Area Pediatrics, Oakland (M.A.L.), the Department of Population Sciences, City of Hope Comprehensive Cancer Center, Duarte (R.A.K.), and the Department of Neurogenetics, University of California, Los Angeles, Los Angeles (N.A.Z.) - all in California; the Centro de Neumología Pediátrica, San Juan, PR (J.R.R.-S.); Emory University School of Medicine, Atlanta (J.R.G.); and the School of Medicine, University of Virginia, Charlottesville (D.S.W.)
| |
Collapse
|
22
|
Decker BM, Hill CE, Baldassano SN, Khankhanian P. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches. Seizure 2021; 85:138-144. [PMID: 33461032 DOI: 10.1016/j.seizure.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 12/16/2022] Open
Abstract
As automated data extraction and natural language processing (NLP) are rapidly evolving, improving healthcare delivery by harnessing large data is garnering great interest. Assessing antiepileptic drug (AED) efficacy and other epilepsy variables pertinent to healthcare delivery remain a critical barrier to improving patient care. In this systematic review, we examined automatic electronic health record (EHR) extraction methodologies pertinent to epilepsy. We also reviewed more generalizable NLP pipelines to extract other critical patient variables. Our review found varying reports of performance measures. Whereas automated data extraction pipelines are a crucial advancement, this review calls attention to standardizing NLP methodology and accuracy reporting for greater generalizability. Moreover, the use of crowdsourcing competitions to spur innovative NLP pipelines would further advance this field.
Collapse
Affiliation(s)
- Barbara M Decker
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States.
| | - Chloé E Hill
- Department of Neurology, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI, 48109, United States
| | - Steven N Baldassano
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| | - Pouya Khankhanian
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| |
Collapse
|
23
|
Raspa M, Paquin RS, Brown DS, Andrews S, Edwards A, Moultrie R, Wagner L, Frisch M, Turner-Brown L, Wheeler AC. Preferences for Accessing Electronic Health Records for Research Purposes: Views of Parents Who Have a Child With a Known or Suspected Genetic Condition. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2020; 23:1639-1652. [PMID: 33248520 PMCID: PMC7701359 DOI: 10.1016/j.jval.2020.06.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 05/04/2020] [Accepted: 09/04/2020] [Indexed: 06/12/2023]
Abstract
OBJECTIVES The purpose of this study was to examine parental preferences for researchers accessing their child's electronic health record across 3 groups: those with a child with (1) a known genetic condition (fragile X syndrome FXS), (2) a suspected genetic condition (autism spectrum disorder [ASD]), and (3) no known genetic condition (typically developing). METHODS After extensive formative work, a discrete choice experiment was designed consisting of 5 attributes, each with 2 or 3 levels, including (1) type of researcher, (2) the use of personally identifiable information, (3) the use of sensitive information, (4) personal importance of research, and (5) return of results. Stratified mixed logit and latent class conditional logit models were examined. RESULTS Parents of children with FXS or ASD had relatively higher preferences for research conducted by nonprofits than parents of typically developing children. Parents of children with ASD also preferred research using non-identifiable and nonsensitive information. Parents of children with FXS or ASD also had preferences for research that was personally important and returned either summary or individual results. Although a few child and family characteristics were related to preferences, they did not overall define the subgroups of parents. CONCLUSIONS Although electronic health record preference research has been conducted with the general public, this is the first study to examine the opinions of parents who have a child with a known or suspected genetic condition. These parents were open to studies using their child's electronic health record because they may have more to gain from this type of research.
Collapse
Affiliation(s)
| | | | - Derek S Brown
- Brown School, Washington University, St. Louis, MO, USA
| | - Sara Andrews
- RTI International, Research Triangle Park, NC, USA
| | - Anne Edwards
- RTI International, Research Triangle Park, NC, USA
| | | | - Laura Wagner
- RTI International, Research Triangle Park, NC, USA
| | - MaryKate Frisch
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | |
Collapse
|
24
|
Díaz-Santiago E, Jabato FM, Rojano E, Seoane P, Pazos F, Perkins JR, Ranea JAG. Phenotype-genotype comorbidity analysis of patients with rare disorders provides insight into their pathological and molecular bases. PLoS Genet 2020; 16:e1009054. [PMID: 33001999 PMCID: PMC7553355 DOI: 10.1371/journal.pgen.1009054] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 10/13/2020] [Accepted: 08/16/2020] [Indexed: 12/15/2022] Open
Abstract
Genetic and molecular analysis of rare disease is made difficult by the small numbers of affected patients. Phenotypic comorbidity analysis can help rectify this by combining information from individuals with similar phenotypes and looking for overlap in terms of shared genes and underlying functional systems. However, few studies have combined comorbidity analysis with genomic data. We present a computational approach that connects patient phenotypes based on phenotypic co-occurence and uses genomic information related to the patient mutations to assign genes to the phenotypes, which are used to detect enriched functional systems. These phenotypes are clustered using network analysis to obtain functionally coherent phenotype clusters. We applied the approach to the DECIPHER database, containing phenotypic and genomic information for thousands of patients with heterogeneous rare disorders and copy number variants. Validity was demonstrated through overlap with known diseases, co-mention within the biomedical literature, semantic similarity measures, and patient cluster membership. These connected pairs formed multiple phenotype clusters, showing functional coherence, and mapped to genes and systems involved in similar pathological processes. Examples include claudin genes from the 22q11 genomic region associated with a cluster of phenotypes related to DiGeorge syndrome and genes related to the GO term anterior/posterior pattern specification associated with abnormal development. The clusters generated can help with the diagnosis of rare diseases, by suggesting additional phenotypes for a given patient and potential underlying functional systems. Other tools to find causal genes based on phenotype were also investigated. The approach has been implemented as a workflow, named PhenCo, which can be adapted to any set of patients for which phenomic and genomic data is available. Full details of the analysis, including the clusters formed, their constituent functional systems and underlying genes are given. Code to implement the workflow is available from GitHub. Although rare diseases each affect a small number of people, taken together they affect millions. Better diagnosis and understanding of the underlying mechanisms are needed. By combining phenotypic data for many rare disease patients, we can build clusters of comorbid phenotypes that tend to co-occur together. By using genomic information, we can supplement these clusters and look for related genes and functional systems, such as pathways and molecular mechanisms. We applied such an approach to thousands of rare disease patients from the DECIPHER resources. We were able to detect hundreds of pairs of comorbid phenotypes, and use them to build tens of phenotype clusters. By mapping genes to these phenotypes, based on data from the same patients, we were able to detect related genes and functional systems, such as genes mapping to the 22q11 genomic region underlying a cluster of phenotypes related to DiGeorge syndrome. To ensure that these clusters made sensible predictions, results were validated using literature co-mention, overlap with known disease and semantic similarity measures. These comorbidity patterns, along with their underlying molecular systems, can give important insights into disease mechanisms, moreover they can be used to direct differential-diagnosis of rare disease patients.
Collapse
Affiliation(s)
- Elena Díaz-Santiago
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | - Fernando M. Jabato
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | - Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | | | - James R. Perkins
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- The Biomedical Research Institute of Malaga (IBIMA), Malaga, Spain
- * E-mail:
| | - Juan A. G. Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- The Biomedical Research Institute of Malaga (IBIMA), Malaga, Spain
| |
Collapse
|
25
|
Abstract
BACKGROUND Data mining technology used in the field of medicine has been widely studied by scholars all over the world. But there is little research on medical data mining (MDM) from the perspectives of bibliometrics and visualization, and the research topics and development trends in this field are still unclear. METHODS This paper has applied bibliometric visualization software tools, VOSviewer 1.6.10 and CiteSpace V, to study the citation characteristics, international cooperation, author cooperation, and geographical distribution of the MDM. RESULTS A total of 1575 documents are obtained, and the most frequent document type is article (1376). SHAN NH is the most productive author, with the highest number of publications of 12, and the Gillies's article (750 times citation) is the most cited paper. The most productive country and institution in MDM is the USA (559) and US FDA (35), respectively. The Journal of Biomedical Informatics, Expert Systems with Applications and Journal of Medical Systems are the most productive journals, which reflected the nature of the research, and keywords "classification (790)" and "system (576)" have the strongest strength. The hot topics in MDM are drug discovery, medical imaging, vaccine safety, and so on. The 3 frontier topics are reporting system, precision medicine, and inflammation, and would be the foci of future research. CONCLUSION The present study provides a panoramic view of data mining methods applied in medicine by visualization and bibliometrics. Analysis of authors, journals, institutions, and countries could provide reference for researchers who are fresh to the field in different ways. Researchers may also consider the emerging trends when deciding the direction of their study.
Collapse
Affiliation(s)
- Yuanzhang Hu
- School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan
| | - Zeyun Yu
- College of Acupuncture and TuiNa, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Xiaoen Cheng
- School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan
| | - Yue Luo
- School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan
| | - Chuanbiao Wen
- School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan
| |
Collapse
|
26
|
Deverka PA, Douglas MP, Phillips KA. Use of Real-World Evidence in US Payer Coverage Decision-Making for Next-Generation Sequencing-Based Tests: Challenges, Opportunities, and Potential Solutions. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2020; 23:540-550. [PMID: 32389218 PMCID: PMC7219085 DOI: 10.1016/j.jval.2020.02.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 01/26/2020] [Accepted: 02/02/2020] [Indexed: 05/05/2023]
Abstract
OBJECTIVES Given the potential of real-world evidence (RWE) to inform understanding of the risk-benefit profile of next-generation sequencing (NGS)-based testing, we undertook a study to describe the current landscape of whether and how payers use RWE as part of their coverage decision making and potential solutions for overcoming barriers. METHODS We performed a scoping literature review of existing RWE evidentiary frameworks for evaluating new technologies and identified barriers to clinical integration and evidence gaps for NGS. We synthesized findings as potential solutions for improving the relevance and utility of RWE for payer decision-making. RESULTS Payers require evidence of clinical utility to inform coverage decisions, yet we found a relatively small number of published RWE studies, and these are predominately focused on oncology, pharmacogenomics, and perinatal/pediatric testing. We identified 3 categories of innovation that may help address the current undersupply of RWE studies for NGS: (1) increasing use of RWE to inform outcomes-based contracting for new technologies, (2) precision medicine initiatives that integrate clinical and genomic data and enable data sharing, and (3) Food and Drug Administration reforms to encourage the use of RWE. Potential solutions include development of data and evidence review standards, payer engagement in RWE study design, use of incentives and partnerships to lower the barriers to RWE generation, education of payers and providers concerning the use of RWE and NGS, and frameworks for conducting outcomes-based contracting for NGS. CONCLUSIONS We provide numerous suggestions to overcome the data, methodologic, infrastructure, and policy challenges constraining greater integration of RWE in assessments of NGS.
Collapse
Affiliation(s)
| | - Michael P Douglas
- Center for Translational and Policy Research on Personalized Medicine, Department of Clinical Pharmacy, University of California at San Francisco, San Francisco, CA, USA
| | - Kathryn A Phillips
- Center for Translational and Policy Research on Personalized Medicine, Department of Clinical Pharmacy, University of California at San Francisco, San Francisco, CA, USA; Philip R. Lee Institute for Health Policy, University of California, San Francisco, San Francisco, CA, USA; Helen Diller Family Comprehensive Cancer, University of California at San Francisco, San Francisco, CA, USA
| |
Collapse
|
27
|
Lee KH, Kim HJ, Kim YJ, Kim JH, Song EY. Extracting Structured Genotype Information from Free-Text HLA Reports Using a Rule-Based Approach. J Korean Med Sci 2020; 35:e78. [PMID: 32233158 PMCID: PMC7105511 DOI: 10.3346/jkms.2020.35.e78] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 01/29/2020] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Human leukocyte antigen (HLA) typing is important for transplant patients to prevent a severe mismatch reaction, and the result can also support the diagnosis of various disease or prediction of drug side effects. However, such secondary applications of HLA typing results are limited because they are typically provided in free-text format or PDFs on electronic medical records. We here propose a method to convert HLA genotype information stored in an unstructured format into a reusable structured format by extracting serotype/allele information. METHODS We queried HLA typing reports from the clinical data warehouse of Seoul National University Hospital (SUPPREME) from 2000 to 2018 as a rule-development data set (64,024 reports) and from the most recent year (6,181 reports) as a test set. We used a rule-based natural language approach using a Python regex function to extract the 1) number of patients in the report, 2) clinical characteristics such as indication of the HLA testing, and 3) precise HLA genotypes. The performance of the rules and codes was evaluated by comparison between the extracted results from the test set and a validation set generated by manual curation. RESULTS Among 11,287 reports for development set and 1,107 for the test set describing HLA typing for a single patient, iterative rule generation developed 124 extracting rules and 8 cleaning rules for HLA genotypes. Application of these rules extracted HLA genotypes with 0.892-0.999 precision and 0.795-0.998 recall for the five HLA genes. The precision and recall of the extracting rules for the number of patients in a report were 0.997 and 0.994 and those for the clinical variable extraction were 0.997 and 0.992, respectively. All extracted HLA alleles and serotypes were transformed according to formal HLA nomenclature by the cleaning rules. CONCLUSION The rule-based HLA genotype extraction method shows reliable accuracy. We believe that there are significant number of patients who takes profit when this under-used genetic information will be return to them.
Collapse
Affiliation(s)
- Kye Hwa Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul, Korea.
| | - Hyo Jung Kim
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics and Systems Biomedical Informatics Research Center, Seoul National University College of Medicine, Seoul, Korea
| | - Yi Jun Kim
- Center for Precision Medicine, Seoul National University Hospital, Seoul, Korea
| | - Ju Han Kim
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics and Systems Biomedical Informatics Research Center, Seoul National University College of Medicine, Seoul, Korea
| | - Eun Young Song
- Department of Laboratory Medicine, Seoul National University College of Medicine, Seoul, Korea.
| |
Collapse
|
28
|
Kirk IK, Simon C, Banasik K, Holm PC, Haue AD, Jensen PB, Juhl Jensen L, Rodríguez CL, Pedersen MK, Eriksson R, Andersen HU, Almdal T, Bork-Jensen J, Grarup N, Borch-Johnsen K, Pedersen O, Pociot F, Hansen T, Bergholdt R, Rossing P, Brunak S. Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining. eLife 2019; 8:44941. [PMID: 31818369 PMCID: PMC6904221 DOI: 10.7554/elife.44941] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 11/16/2019] [Indexed: 12/13/2022] Open
Abstract
Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.
Collapse
Affiliation(s)
- Isa Kristina Kirk
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Christian Simon
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Karina Banasik
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Peter Christoffer Holm
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Amalie Dahl Haue
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Peter Bjødstrup Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.,Odense Patient Data Explorative Network (OPEN), Odense University Hospital, Odense, Denmark
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Cristina Leal Rodríguez
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Mette Krogh Pedersen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Robert Eriksson
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | | | - Thomas Almdal
- Steno Diabetes Center Copenhagen, Gentofte, Denmark.,Department of Endocrinology, Rigshospitalet, Copenhagen, Denmark
| | - Jette Bork-Jensen
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Niels Grarup
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | | | - Oluf Pedersen
- Steno Diabetes Center Copenhagen, Gentofte, Denmark.,Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Flemming Pociot
- Steno Diabetes Center Copenhagen, Gentofte, Denmark.,Department of Clinical Medicine, Herlev-Gentofte Hospital, Herlev, Denmark
| | - Torben Hansen
- Steno Diabetes Center Copenhagen, Gentofte, Denmark.,Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | | | - Peter Rossing
- Steno Diabetes Center Copenhagen, Gentofte, Denmark.,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.,Center for Biological Sequence Analysis, Department of Bio and Health Informatics, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
29
|
De Lillo A, De Angelis F, Di Girolamo M, Luigetti M, Frusconi S, Manfellotto D, Fuciarelli M, Polimanti R. Phenome-wide association study of TTR and RBP4 genes in 361,194 individuals reveals novel insights in the genetics of hereditary and wildtype transthyretin amyloidoses. Hum Genet 2019; 138:1331-1340. [DOI: 10.1007/s00439-019-02078-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Accepted: 10/22/2019] [Indexed: 12/30/2022]
|
30
|
Mate S, Bürkle T, Kapsner LA, Toddenroth D, Kampf MO, Sedlmayr M, Castellanos I, Prokosch HU, Kraus S. A method for the graphical modeling of relative temporal constraints. J Biomed Inform 2019; 100:103314. [PMID: 31629921 DOI: 10.1016/j.jbi.2019.103314] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Revised: 08/13/2019] [Accepted: 10/14/2019] [Indexed: 02/06/2023]
Abstract
Searching for patient cohorts in electronic patient data often requires the definition of temporal constraints between the selection criteria. However, beyond a certain degree of temporal complexity, the non-graphical, form-based approaches implemented in current translational research platforms may be limited when modeling such constraints. In our opinion, there is a need for an easily accessible and implementable, fully graphical method for creating temporal queries. We aim to respond to this challenge with a new graphical notation. Based on Allen's time interval algebra, it allows for modeling temporal queries by arranging simple horizontal bars depicting symbolic time intervals. To make our approach applicable to complex temporal patterns, we apply two extensions: with duration intervals, we enable the inference about relative temporal distances between patient events, and with time interval modifiers, we support counting and excluding patient events, as well as constraining numeric values. We describe how to generate database queries from this notation. We provide a prototypical implementation, consisting of a temporal query modeling frontend and an experimental backend that connects to an i2b2 system. We evaluate our modeling approach on the MIMIC-III database to demonstrate that it can be used for modeling typical temporal phenotyping queries.
Collapse
Affiliation(s)
- Sebastian Mate
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany.
| | - Thomas Bürkle
- Bern University of Applied Sciences, Biel, Switzerland
| | - Lorenz A Kapsner
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Dennis Toddenroth
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Marvin O Kampf
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Ixchel Castellanos
- Department of Anesthesiology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Hans-Ulrich Prokosch
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany; Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Stefan Kraus
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
31
|
Bennett TD, Callahan TJ, Feinstein JA, Ghosh D, Lakhani SA, Spaeder MC, Szefler SJ, Kahn MG. Data Science for Child Health. J Pediatr 2019; 208:12-22. [PMID: 30686480 PMCID: PMC6486872 DOI: 10.1016/j.jpeds.2018.12.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 12/11/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022]
Affiliation(s)
- Tellen D Bennett
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; CU Data Science to Patient Value (D2V), University of Colorado School of Medicine, Aurora, CO; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO.
| | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| | - James A Feinstein
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO
| | - Debashis Ghosh
- CU Data Science to Patient Value (D2V), University of Colorado School of Medicine, Aurora, CO; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| | - Saquib A Lakhani
- Pediatric Genomics Discovery Program, Department of Pediatrics, Yale University School of Medicine, New Haven, CT
| | - Michael C Spaeder
- Pediatric Critical Care, University of Virginia School of Medicine, Charlottesville, VA
| | - Stanley J Szefler
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO
| | - Michael G Kahn
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| |
Collapse
|
32
|
Data electronically extracted from the electronic health record require validation. J Perinatol 2019; 39:468-474. [PMID: 30679823 DOI: 10.1038/s41372-018-0311-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 12/07/2018] [Accepted: 12/23/2018] [Indexed: 11/09/2022]
Abstract
OBJECTIVES Determine sources of error in electronically extracted data from electronic health records. STUDY DESIGN Categorical and continuous variables related to early-onset neonatal hypoglycemia were preselected and electronically extracted from records of 100 randomly selected neonates within 3479 births with laboratory-proven early-onset hypoglycemia. Extraction language was written by an information technologist and data validated by blinded manual chart review. Kappa coefficient assessed categorical variables and percent validity continuous variables. RESULTS 8/23 (35%) categorical variables had acceptable Κappa (1-0.81); 5/23 (22%) had fair-slight agreement, Κappa < 0.40. Notably, "hypoglycemia" had poor agreement, Κappa 0.16. In contrast, 6/8 continuous variables had validity ≥ 94%. After correcting extraction language, 6/9 variables were corrected and inter-rater validation improved. However, "hypoglycemia" was not corrected, remaining an issue. CONCLUSIONS Data extraction without validation procedures, especially categorical variables using International Classification of Diseases-9 (ICD-9) codes, often results in incorrect data identification. Electronically extracted data must incorporate built-in validating processes.
Collapse
|
33
|
Ellis RJ, Wang Z, Genes N, Ma’ayan A. Predicting opioid dependence from electronic health records with machine learning. BioData Min 2019; 12:3. [PMID: 30728857 PMCID: PMC6352440 DOI: 10.1186/s13040-019-0193-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Accepted: 01/22/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND The opioid epidemic in the United States is averaging over 100 deaths per day due to overdose. The effectiveness of opioids as pain treatments, and the drug-seeking behavior of opioid addicts, leads physicians in the United States to issue over 200 million opioid prescriptions every year. To better understand the biomedical profile of opioid-dependent patients, we analyzed information from electronic health records (EHR) including lab tests, vital signs, medical procedures, prescriptions, and other data from millions of patients to predict opioid substance dependence. RESULTS We trained a machine learning model to classify patients by likelihood of having a diagnosis of substance dependence using EHR data from patients diagnosed with substance dependence, along with control patients with no history of substance-related conditions, matched by age, gender, and status of HIV, hepatitis C, and sickle cell disease. The top machine learning classifier using all features achieved a mean area under the receiver operating characteristic (AUROC) curve of ~ 92%, and analysis of the model uncovered associations between basic clinical factors and substance dependence. Additionally, diagnoses, prescriptions, and procedures prior to the diagnoses of substance dependence were analyzed to elucidate the clinical profile of substance-dependent patients, relative to controls. CONCLUSIONS The predictive model may hold utility for identifying patients at risk of developing dependence, risk of overdose, and opioid-seeking patients that report other symptoms in their visits to the emergency room.
Collapse
Affiliation(s)
- Randall J. Ellis
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Zichen Wang
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Nicholas Genes
- Department of Emergency Medicine, Mount Sinai Hospital, New York, NY 10029 USA
| | - Avi Ma’ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| |
Collapse
|
34
|
Pacheco JA, Rasmussen LV, Kiefer RC, Campion TR, Speltz P, Carroll RJ, Stallings SC, Mo H, Ahuja M, Jiang G, LaRose ER, Peissig PL, Shang N, Benoit B, Gainer VS, Borthwick K, Jackson KL, Sharma A, Wu AY, Kho AN, Roden DM, Pathak J, Denny JC, Thompson WK. A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments. J Am Med Inform Assoc 2018; 25:1540-1546. [PMID: 30124903 PMCID: PMC6213083 DOI: 10.1093/jamia/ocy101] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 06/13/2018] [Accepted: 07/10/2018] [Indexed: 12/12/2022] Open
Abstract
Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.
Collapse
Affiliation(s)
- Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Luke V Rasmussen
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Richard C Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Thomas R Campion
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Peter Speltz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Sarah C Stallings
- Meharry-Vanderbilt Alliance, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Huan Mo
- Department of Pathology, Loma Linda University Health, Loma Linda, California, USA
| | - Monika Ahuja
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Eric R LaRose
- Department of Biomedical Informatics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, USA
| | - Peggy L Peissig
- Department of Biomedical Informatics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, USA
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Barbara Benoit
- Research IS and Computing, Partners HealthCare, Harvard University, Somerville, Massachusetts, USA
| | - Vivian S Gainer
- Research IS and Computing, Partners HealthCare, Harvard University, Somerville, Massachusetts, USA
| | - Kenneth Borthwick
- Henry Hood Center for Health Research, Geisinger, Danville, Pennsylvania, USA
| | - Kathryn L Jackson
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Ambrish Sharma
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Andy Yizhou Wu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Abel N Kho
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jyotishman Pathak
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - William K Thompson
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| |
Collapse
|
35
|
Wong J, Horwitz MM, Zhou L, Toh S. Using machine learning to identify health outcomes from electronic health record data. CURR EPIDEMIOL REP 2018; 5:331-342. [PMID: 30555773 DOI: 10.1007/s40471-018-0165-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Purpose of review Electronic health records (EHRs) contain valuable data for identifying health outcomes, but these data also present numerous challenges when creating computable phenotyping algorithms. Machine learning methods could help with some of these challenges. In this review, we discuss four common scenarios that researchers may find helpful for thinking critically about when and for what tasks machine learning may be used to identify health outcomes from EHR data. Recent findings We first consider the conditions in which machine learning may be especially useful with respect to two dimensions of a health outcome: 1) the characteristics of its diagnostic criteria, and 2) the format in which its diagnostic data are usually stored within EHR systems. In the first dimension, we propose that for health outcomes with diagnostic criteria involving many clinical factors, vague definitions, or subjective interpretations, machine learning may be useful for modeling the complex diagnostic decision-making process from a vector of clinical inputs to identify individuals with the health outcome. In the second dimension, we propose that for health outcomes where diagnostic information is largely stored in unstructured formats such as free text or images, machine learning may be useful for extracting and structuring this information as part of a natural language processing system or an image recognition task. We then consider these two dimensions jointly to define four common scenarios of health outcomes. For each scenario, we discuss the potential uses for machine learning - first assuming accurate and complete EHR data and then relaxing these assumptions to accommodate the limitations of real-world EHR systems. We illustrate these four scenarios using concrete examples and describe how recent studies have used machine learning to identify these health outcomes from EHR data. Summary Machine learning has great potential to improve the accuracy and efficiency of health outcome identification from EHR systems, especially under certain conditions. To promote the use of machine learning in EHR-based phenotyping tasks, future work should prioritize efforts to increase the transportability of machine learning algorithms for use in multi-site settings.
Collapse
Affiliation(s)
- Jenna Wong
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA
| | - Mara Murray Horwitz
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA.,Harvard Medical School, Boston, MA
| | - Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA
| |
Collapse
|
36
|
A study paradigm integrating prospective epidemiologic cohorts and electronic health records to identify disease biomarkers. Nat Commun 2018; 9:3522. [PMID: 30166544 PMCID: PMC6117367 DOI: 10.1038/s41467-018-05624-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 07/13/2018] [Indexed: 01/05/2023] Open
Abstract
Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. A false discovery rate (FDR)-based significance threshold reveals more known and undescribed associations across a broad range of biomarkers, including biometric measures, plasma proteins and metabolites, functional assays, and behaviors. We confirm an inverse association between LDL-cholesterol level and septicemia risk in an independent epidemiological cohort. This approach efficiently discovers biomarker-disease associations. Biomarker identification requires prohibitively large cohorts with gene expression and phenotype data. The approach introduced here learns polygenic predictors of expression from genetic and expression data, used to infer biomarker levels in patients with genetic and disease information.
Collapse
|
37
|
Robinson JR, Wei WQ, Roden DM, Denny JC. Defining Phenotypes from Clinical Data to Drive Genomic Research. Annu Rev Biomed Data Sci 2018; 1:69-92. [PMID: 34109303 DOI: 10.1146/annurev-biodatasci-080917-013335] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The rise in available longitudinal patient information in electronic health records (EHRs) and their coupling to DNA biobanks has resulted in a dramatic increase in genomic research using EHR data for phenotypic information. EHRs have the benefit of providing a deep and broad data source of health-related phenotypes, including drug response traits, expanding the phenome available to researchers for discovery. The earliest efforts at repurposing EHR data for research involved manual chart review of limited numbers of patients but now typically involve applications of rule-based and machine learning algorithms operating on sometimes huge corpora for both genome-wide and phenome-wide approaches. We highlight here the current methods, impact, challenges, and opportunities for repurposing clinical data to define patient phenotypes for genomics discovery. Use of EHR data has proven a powerful method for elucidation of genomic influences on diseases, traits, and drug-response phenotypes and will continue to have increasing applications in large cohort studies.
Collapse
Affiliation(s)
- Jamie R Robinson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN.,Department of General Surgery, Vanderbilt University Medical Center, Nashville, TN
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN.,Department of Medicine, Vanderbilt University Medical Center, Nashville, TN.,Department of Pharmacology, Vanderbilt University Medical Center
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN.,Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
38
|
Edinger T, Demner-Fushman D, Cohen AM, Bedrick S, Hersh W. Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:660-669. [PMID: 29854131 PMCID: PMC5977655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Objective: Secondary use of electronic health record (EHR) data is enabled by accurate and complete retrieval of the relevant patient cohort, which requires searching both structured and unstructured data. Clinical text poses difficulties to searching, although chart notes incorporate structure that may facilitate accurate retrieval. Methods: We developed rules identifying clinical document sections, which can be indexed in search engines that allow faceted searches, such as Lucene or Essie, an NLM search engine. We developed 22 clinical cohorts and two queries for each cohort, one utilizing section headings and the other searching the whole document. We manually evaluated a subset of retrieved documents to compare query performance. Results: Querying by section had lower recall than whole-document queries (0.83 vs 0.95), higher precision (0.73 vs 0.54), and higher F1 (0.78 vs 0.69). Conclusion: This evaluation suggests that searching specific sections may improve precision under certain conditions and often with loss of recall.
Collapse
Affiliation(s)
- Tracy Edinger
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Dina Demner-Fushman
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Aaron M Cohen
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Steven Bedrick
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - William Hersh
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| |
Collapse
|
39
|
Garcelon N, Neuraz A, Salomon R, Faour H, Benoit V, Delapalme A, Munnich A, Burgun A, Rance B. A clinician friendly data warehouse oriented toward narrative reports: Dr. Warehouse. J Biomed Inform 2018; 80:52-63. [DOI: 10.1016/j.jbi.2018.02.019] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 02/22/2018] [Accepted: 02/28/2018] [Indexed: 01/26/2023]
|
40
|
Nedungadi P, Iyer A, Gutjahr G, Bhaskar J, Pillai AB. Data-Driven Methods for Advancing Precision Oncology. CURRENT PHARMACOLOGY REPORTS 2018; 4:145-156. [PMID: 33520605 PMCID: PMC7845924 DOI: 10.1007/s40495-018-0127-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
PURPOSE OF REVIEW This article discusses the advances, methods, challenges, and future directions of data-driven methods in advancing precision oncology for biomedical research, drug discovery, clinical research, and practice. RECENT FINDINGS Precision oncology provides individually tailored cancer treatment by considering an individual's genetic makeup, clinical, environmental, social, and lifestyle information. Challenges include voluminous, heterogeneous, and disparate data generated by different technologies with multiple modalities such as Omics, electronic health records, clinical registries and repositories, medical imaging, demographics, wearables, and sensors. Statistical and machine learning methods have been continuously adapting to the ever-increasing size and complexity of data. Precision Oncology supportive analytics have improved turnaround time in biomarker discovery and time-to-application of new and repurposed drugs. Precision oncology additionally seeks to identify target patient populations based on genomic alterations that are sensitive or resistant to conventional or experimental treatments. Predictive models have been developed for cancer progression and survivorship, drug sensitivity and resistance, and identification of the most suitable combination treatments for individual patient scenarios. In the future, clinical decision support systems need to be revamped to better incorporate knowledge from precision oncology, thus enabling clinical practitioners to provide precision cancer care. SUMMARY Open Omics datasets, machine learning algorithms, and predictive models have enabled the advancement of precision oncology. Clinical decision support systems with integrated electronic health record and Omics data are needed to provide data-driven recommendations to assist clinicians in disease prevention, early identification, and individualized treatment. Additionally, as cancer is a constantly evolving disorder, clinical decision systems will need to be continually updated based on more recent knowledge and datasets.
Collapse
Affiliation(s)
- Prema Nedungadi
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
- Department of Computer Science, School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Akshay Iyer
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Georg Gutjahr
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Jasmine Bhaskar
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
- Department of Computer Science, School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Asha B. Pillai
- Division of Pediatric Hematology/Oncology, Departments of Pediatrics and Microbiology and Immunology, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
41
|
Rai J, Kaushik K. Reduction of Animal Sacrifice in Biomedical Science & Research through Alternative Design of Animal Experiments. Saudi Pharm J 2018; 26:896-902. [PMID: 30202234 PMCID: PMC6128677 DOI: 10.1016/j.jsps.2018.03.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 03/10/2018] [Indexed: 12/15/2022] Open
Abstract
Various upcoming techniques can be used in replacement of experiments requiring animal sacrifice or products of animal sacrifice. In many instances these techniques provide more reproducibility and control of parameter, compared to experiments involving animal or animal products. Use of these techniques can avoid the question of the animal sacrifice during experiment and subsequently permission of ethical approval. In silico simulation, informatics, 3D cell culture models, organ-on-chips are some innovative technology which can reduce the number of animals sacrifice. Scientist evolved some innovative culture procedures and production of animal friendly affinity reagents which are free from the product of animal sacrifice. Direct investigation on human body for treatment as well as further research, electronic health record is also helpful in the reduction of animals sacrifice in biomedical investigations. These techniques and strategies of research can be more cost effective as well as more relevant to various issues related to the human health. Some medical blunder has also been reported after the successful testing of drugs on animal’s model. Hence, the reliability of animal experiment in context with human health is questionable. Alternative to animal experiments help to reduce the number of animals required for research up to certain extent but is not able to eliminate the need for animals in research completely. Wisely use of animals in teaching & research is expected and the importance of animal experimentation in futuristic development in life science cannot be ignored.
Collapse
Affiliation(s)
- Jagdish Rai
- Institute of Forensic Science & Criminology, Panjab University, Chandigarh 160014, India
| | - Kuldeep Kaushik
- Department of Zoology, Dev Samaj College for Women, Firozpur City, Punjab 152002, India
| |
Collapse
|
42
|
White KD, Abe R, Ardern-Jones M, Beachkofsky T, Bouchard C, Carleton B, Chodosh J, Cibotti R, Davis R, Denny JC, Dodiuk-Gad RP, Ergen EN, Goldman JL, Holmes JH, Hung SI, Lacouture ME, Lehloenya RJ, Mallal S, Manolio TA, Micheletti RG, Mitchell CM, Mockenhaupt M, Ostrov DA, Pavlos R, Pirmohamed M, Pope E, Redwood A, Rosenbach M, Rosenblum MD, Roujeau JC, Saavedra AP, Saeed HN, Struewing JP, Sueki H, Sukasem C, Sung C, Trubiano JA, Weintraub J, Wheatley LM, Williams KB, Worley B, Chung WH, Shear NH, Phillips EJ. SJS/TEN 2017: Building Multidisciplinary Networks to Drive Science and Translation. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2018; 6:38-69. [PMID: 29310768 PMCID: PMC5857362 DOI: 10.1016/j.jaip.2017.11.023] [Citation(s) in RCA: 124] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 11/20/2017] [Accepted: 11/21/2017] [Indexed: 12/17/2022]
Abstract
Stevens-Johnson syndrome/toxic epidermal necrolysis (SJS/TEN) is a life-threatening, immunologically mediated, and usually drug-induced disease with a high burden to individuals, their families, and society with an annual incidence of 1 to 5 per 1,000,000. To effect significant reduction in short- and long-term morbidity and mortality, and advance clinical care and research, coordination of multiple medical, surgical, behavioral, and basic scientific disciplines is required. On March 2, 2017, an investigator-driven meeting was held immediately before the American Academy of Dermatology Annual meeting for the central purpose of assembling, for the first time in the United States, clinicians and scientists from multiple disciplines involved in SJS/TEN clinical care and basic science research. As a product of this meeting, this article summarizes the current state of knowledge and expert opinion related to SJS/TEN covering a broad spectrum of topics including epidemiology and pharmacogenomic networks; clinical management and complications; special populations such as pediatrics, the elderly, and pregnant women; regulatory issues and the electronic health record; new agents that cause SJS/TEN; pharmacogenomics and immunopathogenesis; and the patient perspective. Goals include the maintenance of a durable and productive multidisciplinary network that will significantly further scientific progress and translation into prevention, early diagnosis, and management of SJS/TEN.
Collapse
Affiliation(s)
- Katie D White
- Vanderbilt University Medical Center, Nashville, Tenn
| | - Riichiro Abe
- Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | | | - Thomas Beachkofsky
- Wilford Hall Ambulatory Surgical Center, Lackland Air Force Base, San Antonio, Texas
| | | | - Bruce Carleton
- University of British Columbia, Vancouver, British Columbia, Canada; B.C. Children's Hospital, British Columbia, Vancouver, British Columbia, Canada
| | - James Chodosh
- Massachusetts Eye and Ear, Harvard Medical School, Boston, Mass
| | - Ricardo Cibotti
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Md
| | - Robert Davis
- University of Tennessee Health Sciences, Memphis, Tenn
| | | | - Roni P Dodiuk-Gad
- Emek Medical Center, Technion-Institute of Technology, Afula, Israel; Sunnybrook Health Sciences Centre, University of Toronto, Toronto, Ontario, Canada
| | | | | | - James H Holmes
- Wake Forest Baptist Medical Center, Winston-Salem, NC; Wake Forest University School of Medicine, Winston-Salem, NC
| | | | | | | | - Simon Mallal
- Vanderbilt University Medical Center, Nashville, Tenn; Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia
| | - Teri A Manolio
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Md; F. Edward Hébert School of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Md
| | | | | | - Maja Mockenhaupt
- Medical Center and Medical Faculty-University of Freiburg, Freiburg, Germany
| | | | - Rebecca Pavlos
- Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia
| | | | - Elena Pope
- University of Toronto, Toronto, Ontario, Canada; Hospital for Sick Children, Toronto, Ontario, Canada
| | - Alec Redwood
- Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia
| | | | | | | | | | - Hajirah N Saeed
- Massachusetts Eye and Ear, Harvard Medical School, Boston, Mass
| | - Jeffery P Struewing
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Md
| | | | | | - Cynthia Sung
- Duke-NUS Medical School, Singapore, Singapore; Health Sciences Authority, Singapore, Singapore
| | - Jason A Trubiano
- Austin Health, Heidelberg, Victoria, Australia; University of Melbourne, Melbourne, Victoria, Australia
| | | | - Lisa M Wheatley
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Md
| | | | | | | | - Neil H Shear
- Vanderbilt University Medical Center, Nashville, Tenn
| | - Elizabeth J Phillips
- Vanderbilt University Medical Center, Nashville, Tenn; Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia.
| |
Collapse
|
43
|
Garcelon N, Neuraz A, Benoit V, Salomon R, Burgun A. Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse. J Am Med Inform Assoc 2017; 24:607-613. [PMID: 28339516 DOI: 10.1093/jamia/ocw144] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 08/31/2016] [Indexed: 12/19/2022] Open
Abstract
Objective The repurposing of electronic health records (EHRs) can improve clinical and genetic research for rare diseases. However, significant information in rare disease EHRs is embedded in the narrative reports, which contain many negated clinical signs and family medical history. This paper presents a method to detect family history and negation in narrative reports and evaluates its impact on selecting populations from a clinical data warehouse (CDW). Materials and Methods We developed a pipeline to process 1.6 million reports from multiple sources. This pipeline is part of the load process of the Necker Hospital CDW. Results We identified patients with "Lupus and diarrhea," "Crohn's and diabetes," and "NPHP1" from the CDW. The overall precision, recall, specificity, and F-measure were 0.85, 0.98, 0.93, and 0.91, respectively. Conclusion The proposed method generates a highly accurate identification of cases from a CDW of rare disease EHRs.
Collapse
Affiliation(s)
- Nicolas Garcelon
- Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France
- INSERM, Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Antoine Neuraz
- Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France
- INSERM, Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Vincent Benoit
- Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France
| | - Rémi Salomon
- Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France
- Service de Néphrologie Pédiatrique, Hôpital Necker-Enfants Malades, Assistance Publique -Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France
| | - Anita Burgun
- INSERM, Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- Hôpital Européen Georges Pompidou, Assistance Publique -Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France
| |
Collapse
|
44
|
Al Kawam A, Sen A, Datta A, Dickey N. Understanding the Bioinformatics Challenges of Integrating Genomics into Healthcare. IEEE J Biomed Health Inform 2017; 22:1672-1683. [PMID: 29990071 DOI: 10.1109/jbhi.2017.2778263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genomic data is paving the way towards personalized healthcare. By unveiling genetic disease-contributing factors, genomic data can aid in the detection, diagnosis, and treatment of a wide range of complex diseases. Integrating genomic data into healthcare is riddled with a wide range of challenges spanning social, ethical, legal, educational, economic, and technical aspects. Bioinformatics is a core integration aspect presenting an overwhelming number of unaddressed challenges. In this paper we tackle the fundamental bioinformatics integration concerns including: genomic data generation, storage, representation, and utilization in conjunction with clinical data. We divide the bioinformatics challenges into a series of seven intertwined integration aspects spanning the areas of informatics, knowledge management, and communication. For each aspect, we provide a detailed discussion of the current research directions, outstanding challenges, and possible resolutions. This paper seeks to help narrow the gap between the genomic applications, which are being predominantly utilized in research settings, and the clinical adoption of these applications.
Collapse
|
45
|
Wang L, Damrauer SM, Zhang H, Zhang AX, Xiao R, Moore JH, Chen J. Phenotype validation in electronic health records based genetic association studies. Genet Epidemiol 2017; 41:790-800. [PMID: 29023970 DOI: 10.1002/gepi.22080] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 06/30/2017] [Accepted: 08/01/2017] [Indexed: 12/13/2022]
Abstract
The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in the setting of phenome-wide association studies (PheWAS). It is essential to characterize discovered associations using gold standard phenotype data by chart review. In this work, we propose a genotype stratified case-control sampling strategy to select subjects for phenotype validation. We develop a closed-form maximum-likelihood estimator for the odds ratio parameters and a score statistic for testing genetic association using the combined validated and error-prone EHR-derived phenotype data, and assess the extent of power improvement provided by this approach. Compared with case-control sampling based only on EHR-derived phenotype data, our genotype stratified strategy maintains nominal type I error rates, and result in higher power for detecting associations. It also corrects the bias in the odds ratio parameter estimates, and reduces the corresponding variance especially when the minor allele frequency is small.
Collapse
Affiliation(s)
- Lu Wang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Scott M Damrauer
- Division of Vascular Surgery and Endovascular Therapy, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, Pennsylvania, United States of America
| | - Hong Zhang
- Institute of Biostatistics, Fudan University, Shanghai, P.R. China
| | - Alan X Zhang
- Sidwell Friends School, Washington, DC, United States of America
| | - Rui Xiao
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
46
|
Meystre SM, Lovis C, Bürkle T, Tognola G, Budrionis A, Lehmann CU. Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress. Yearb Med Inform 2017; 26:38-52. [PMID: 28480475 PMCID: PMC6239225 DOI: 10.15265/iy-2017-007] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Indexed: 12/30/2022] Open
Abstract
Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research.
Collapse
Affiliation(s)
- S. M. Meystre
- Medical University of South Carolina, Charleston, SC, USA
| | - C. Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Switzerland
| | - T. Bürkle
- University of Applied Sciences, Bern, Switzerland
| | - G. Tognola
- Institute of Electronics, Computer and Telecommunication Engineering, Italian Natl. Research Council IEIIT-CNR, Milan, Italy
| | - A. Budrionis
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
| | - C. U. Lehmann
- Departments of Biomedical Informatics and Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
47
|
Lin FPY, Pokorny A, Teng C, Epstein RJ. TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records. Sci Rep 2017; 7:6918. [PMID: 28761061 PMCID: PMC5537364 DOI: 10.1038/s41598-017-07111-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 06/21/2017] [Indexed: 12/13/2022] Open
Abstract
Vast amounts of clinically relevant text-based variables lie undiscovered and unexploited in electronic medical records (EMR). To exploit this untapped resource, and thus facilitate the discovery of informative covariates from unstructured clinical narratives, we have built a novel computational pipeline termed Text-based Exploratory Pattern Analyser for Prognosticator and Associator discovery (TEPAPA). This pipeline combines semantic-free natural language processing (NLP), regular expression induction, and statistical association testing to identify conserved text patterns associated with outcome variables of clinical interest. When we applied TEPAPA to a cohort of head and neck squamous cell carcinoma patients, plausible concepts known to be correlated with human papilloma virus (HPV) status were identified from the EMR text, including site of primary disease, tumour stage, pathologic characteristics, and treatment modalities. Similarly, correlates of other variables (including gender, nodal status, recurrent disease, smoking and alcohol status) were also reliably recovered. Using highly-associated patterns as covariates, a patient's HPV status was classifiable using a bootstrap analysis with a mean area under the ROC curve of 0.861, suggesting its predictive utility in supporting EMR-based phenotyping tasks. These data support using this integrative approach to efficiently identify disease-associated factors from unstructured EMR narratives, and thus to efficiently generate testable hypotheses.
Collapse
Affiliation(s)
- Frank Po-Yen Lin
- Department of Oncology, St Vincent's Hospital & The Kinghorn Cancer Centre, Darlinghurst, NSW, Australia.
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.
| | - Adrian Pokorny
- Department of Oncology, St Vincent's Hospital & The Kinghorn Cancer Centre, Darlinghurst, NSW, Australia
| | - Christina Teng
- Department of Medical Oncology, Liverpool Hospital, Liverpool, Sydney, NSW, Australia
| | - Richard J Epstein
- Department of Oncology, St Vincent's Hospital & The Kinghorn Cancer Centre, Darlinghurst, NSW, Australia
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| |
Collapse
|
48
|
Montvida O, Arandjelović O, Reiner E, Paul SK. Data Mining Approach to Estimate the Duration of Drug Therapy from Longitudinal Electronic Medical Records. ACTA ACUST UNITED AC 2017. [DOI: 10.2174/1875036201709010001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Background:
Electronic Medical Records (EMRs) from primary/ ambulatory care systems present a new and promising source of information for conducting clinical and translational research.
Objectives:
To address the methodological and computational challenges in order to extract reliable medication information from raw data which is often complex, incomplete and erroneous. To assess whether the use of specific chaining fields of medication information may additionally improve the data quality.
Methods:
Guided by a range of challenges associated with missing and internally inconsistent data, we introduce two methods for the robust extraction of patient-level medication data. First method relies on chaining fields to estimate duration of treatment (“chaining”), while second disregards chaining fields and relies on the chronology of records (“continuous”). Centricity EMR database was used to estimate treatment duration with both methods for two widely prescribed drugs among type 2 diabetes patients: insulin and glucagon-like peptide-1 receptor agonists.
Results:
At individual patient level the “chaining” approach could identify the treatment alterations longitudinally and produced more robust estimates of treatment duration for individual drugs, while the “continuous” method was unable to capture that dynamics. At population level, both methods produced similar estimates of average treatment duration, however, notable differences were observed at individual-patient level.
Conclusion:
The proposed algorithms explicitly identify and handle longitudinal erroneous or missing entries and estimate treatment duration with specific drug(s) of interest, which makes them a valuable tool for future EMR based clinical and pharmaco-epidemiological studies. To improve accuracy of real-world based studies, implementing chaining fields of medication information is recommended.
Collapse
|
49
|
Clifton DA, Niehaus KE, Charlton P, Colopy GW. Health Informatics via Machine Learning for the Clinical Management of Patients. Yearb Med Inform 2017; 10:38-43. [PMID: 26293849 DOI: 10.15265/iy-2015-014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
OBJECTIVES To review how health informatics systems based on machine learning methods have impacted the clinical management of patients, by affecting clinical practice. METHODS We reviewed literature from 2010-2015 from databases such as Pubmed, IEEE xplore, and INSPEC, in which methods based on machine learning are likely to be reported. We bring together a broad body of literature, aiming to identify those leading examples of health informatics that have advanced the methodology of machine learning. While individual methods may have further examples that might be added, we have chosen some of the most representative, informative exemplars in each case. RESULTS Our survey highlights that, while much research is taking place in this high-profile field, examples of those that affect the clinical management of patients are seldom found. We show that substantial progress is being made in terms of methodology, often by data scientists working in close collaboration with clinical groups. CONCLUSIONS Health informatics systems based on machine learning are in their infancy and the translation of such systems into clinical management has yet to be performed at scale.
Collapse
Affiliation(s)
- D A Clifton
- David A. Clifton, Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK, E-mail:
| | | | | | | |
Collapse
|
50
|
Wei WQ, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ, Gamazon ER, Cox NJ, Roden DM, Denny JC. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS One 2017; 12:e0175508. [PMID: 28686612 PMCID: PMC5501393 DOI: 10.1371/journal.pone.0175508] [Citation(s) in RCA: 214] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 03/27/2017] [Indexed: 12/20/2022] Open
Abstract
OBJECTIVE To compare three groupings of Electronic Health Record (EHR) billing codes for their ability to represent clinically meaningful phenotypes and to replicate known genetic associations. The three tested coding systems were the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, the Agency for Healthcare Research and Quality Clinical Classification Software for ICD-9-CM (CCS), and manually curated "phecodes" designed to facilitate phenome-wide association studies (PheWAS) in EHRs. METHODS AND MATERIALS We selected 100 disease phenotypes and compared the ability of each coding system to accurately represent them without performing additional groupings. The 100 phenotypes included 25 randomly-chosen clinical phenotypes pursued in prior genome-wide association studies (GWAS) and another 75 common disease phenotypes mentioned across free-text problem lists from 189,289 individuals. We then evaluated the performance of each coding system to replicate known associations for 440 SNP-phenotype pairs. RESULTS Out of the 100 tested clinical phenotypes, phecodes exactly matched 83, compared to 53 for ICD-9-CM and 32 for CCS. ICD-9-CM codes were typically too detailed (requiring custom groupings) while CCS codes were often not granular enough. Among 440 tested known SNP-phenotype associations, use of phecodes replicated 153 SNP-phenotype pairs compared to 143 for ICD-9-CM and 139 for CCS. Phecodes also generally produced stronger odds ratios and lower p-values for known associations than ICD-9-CM and CCS. Finally, evaluation of several SNPs via PheWAS identified novel potential signals, some seen in only using the phecode approach. Among them, rs7318369 in PEPD was associated with gastrointestinal hemorrhage. CONCLUSION Our results suggest that the phecode groupings better align with clinical diseases mentioned in clinical practice or for genomic studies. ICD-9-CM, CCS, and phecode groupings all worked for PheWAS-type studies, though the phecode groupings produced superior results.
Collapse
Affiliation(s)
- Wei-Qi Wei
- Departments of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Lisa A. Bastarache
- Departments of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Robert J. Carroll
- Departments of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Joy E. Marlo
- Departments of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Travis J. Osterman
- Departments of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Departments of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Eric R. Gamazon
- Vanderbilt Genetic Institute and the Division of Genetic Medicine, Vanderbilt University, Nashville, TN, United States of America
- Department of Clinical Epidemiology, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
- Department of Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
- Department of Department of Psychiatry, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | - Nancy J. Cox
- Vanderbilt Genetic Institute and the Division of Genetic Medicine, Vanderbilt University, Nashville, TN, United States of America
| | - Dan M. Roden
- Departments of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Departments of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Department of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Joshua C. Denny
- Departments of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Departments of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
- * E-mail:
| |
Collapse
|