1
|
Bergquist T, Loomba J, Pfaff E, Xia F, Zhao Z, Zhu Y, Mitchell E, Bhattacharya B, Shetty G, Munia T, Delong G, Tariq A, Butzin-Dozier Z, Ji Y, Li H, Coyle J, Shi S, Philips RV, Mertens A, Pirracchio R, van der Laan M, Colford JM, Hubbard A, Gao J, Chen G, Velingker N, Li Z, Wu Y, Stein A, Huang J, Dai Z, Long Q, Naik M, Holmes J, Mowery D, Wong E, Parekh R, Getzen E, Hightower J, Blase J. Crowd-sourced machine learning prediction of long COVID using data from the National COVID Cohort Collaborative. EBioMedicine 2024; 108:105333. [PMID: 39321500 PMCID: PMC11462169 DOI: 10.1016/j.ebiom.2024.105333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 08/17/2024] [Accepted: 08/29/2024] [Indexed: 09/27/2024] Open
Abstract
BACKGROUND While many patients seem to recover from SARS-CoV-2 infections, many patients report experiencing SARS-CoV-2 symptoms for weeks or months after their acute COVID-19 ends, even developing new symptoms weeks after infection. These long-term effects are called post-acute sequelae of SARS-CoV-2 (PASC) or, more commonly, Long COVID. The overall prevalence of Long COVID is currently unknown, and tools are needed to help identify patients at risk for developing long COVID. METHODS A working group of the Rapid Acceleration of Diagnostics-radical (RADx-rad) program, comprised of individuals from various NIH institutes and centers, in collaboration with REsearching COVID to Enhance Recovery (RECOVER) developed and organized the Long COVID Computational Challenge (L3C), a community challenge aimed at incentivizing the broader scientific community to develop interpretable and accurate methods for identifying patients at risk of developing Long COVID. From August 2022 to December 2022, participants developed Long COVID risk prediction algorithms using the National COVID Cohort Collaborative (N3C) data enclave, a harmonized data repository from over 75 healthcare institutions from across the United States (U.S.). FINDINGS Over the course of the challenge, 74 teams designed and built 35 Long COVID prediction models using the N3C data enclave. The top 10 teams all scored above a 0.80 Area Under the Receiver Operator Curve (AUROC) with the highest scoring model achieving a mean AUROC of 0.895. Included in the top submission was a visualization dashboard that built timelines for each patient, updating the risk of a patient developing Long COVID in response to clinical events. INTERPRETATION As a result of L3C, federal reviewers identified multiple machine learning models that can be used to identify patients at risk for developing Long COVID. Many of the teams used approaches in their submissions which can be applied to future clinical prediction questions. FUNDING Research reported in this RADx® Rad publication was supported by the National Institutes of Health. Timothy Bergquist, Johanna Loomba, and Emily Pfaff were supported by Axle Subcontract: NCATS-STSS-P00438.
Collapse
Affiliation(s)
| | | | - Emily Pfaff
- University of North Carolina at Chapel Hill, Durham, NC, USA
| | | | | | - Yitan Zhu
- University of Chicago, Chicago, IL, USA
| | | | | | | | | | | | | | | | - Yunwen Ji
- University of California Berkeley, Berkeley, CA, USA
| | - Haodong Li
- University of California Berkeley, Berkeley, CA, USA
| | - Jeremy Coyle
- University of California Berkeley, Berkeley, CA, USA
| | - Seraphina Shi
- University of California Berkeley, Berkeley, CA, USA
| | | | | | | | | | | | - Alan Hubbard
- University of California Berkeley, Berkeley, CA, USA
| | - Jifan Gao
- University of Wisconsin-Madison, Madison, WI, USA
| | - Guanhua Chen
- University of Wisconsin-Madison, Madison, WI, USA
| | | | - Ziyang Li
- University of Pennsylvania, Philadelphia, PA, USA
| | - Yinjun Wu
- University of Pennsylvania, Philadelphia, PA, USA
| | - Adam Stein
- University of Pennsylvania, Philadelphia, PA, USA
| | - Jiani Huang
- University of Pennsylvania, Philadelphia, PA, USA
| | - Zongyu Dai
- University of Pennsylvania, Philadelphia, PA, USA
| | - Qi Long
- University of Pennsylvania, Philadelphia, PA, USA
| | - Mayur Naik
- University of Pennsylvania, Philadelphia, PA, USA
| | - John Holmes
- University of Pennsylvania, Philadelphia, PA, USA
| | | | - Eric Wong
- University of Pennsylvania, Philadelphia, PA, USA
| | - Ravi Parekh
- University of Pennsylvania, Philadelphia, PA, USA
| | - Emily Getzen
- University of Pennsylvania, Philadelphia, PA, USA
| | | | | |
Collapse
|
2
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
3
|
Bergquist T, Schaffter T, Yan Y, Yu T, Prosser J, Gao J, Chen G, Charzewski Ł, Nawalany Z, Brugere I, Retkute R, Prusokiene A, Prusokas A, Choi Y, Lee S, Choe J, Lee I, Kim S, Kang J, Mooney SD, Guinney J. Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine. J Am Med Inform Assoc 2023; 31:35-44. [PMID: 37604111 PMCID: PMC10746301 DOI: 10.1093/jamia/ocad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/05/2023] [Accepted: 08/08/2023] [Indexed: 08/23/2023] Open
Abstract
OBJECTIVE Applications of machine learning in healthcare are of high interest and have the potential to improve patient care. Yet, the real-world accuracy of these models in clinical practice and on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as the community challenge question. MATERIALS AND METHODS Using a Model-to-Data framework, 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries, generated 25 accurate models all trained on a dataset of over 1.1 million patients and evaluated on patients prospectively collected over a 1-year observation of a large health system. RESULTS The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI, 0.942-0.951) and an area under the precision-recall curve of 0.487 (95% CI, 0.458-0.499) on a prospectively collected patient cohort. DISCUSSION Post hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data. CONCLUSION This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.
Collapse
Affiliation(s)
- Timothy Bergquist
- Sage Bionetworks, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | | | - Yao Yan
- Sage Bionetworks, Seattle, WA, United States
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, United States
| | - Thomas Yu
- Sage Bionetworks, Seattle, WA, United States
| | - Justin Prosser
- Institute of Translational Health Sciences, University of Washington, Seattle, WA, United States
| | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Łukasz Charzewski
- Proacta, Warsaw, Poland
- Division of Biophysics, University of Warsaw, Warsaw, Poland
| | | | - Ivan Brugere
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Alisa Prusokiene
- Plant and Molecular Sciences, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Augustinas Prusokas
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Yonghwa Choi
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sanghoon Lee
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Junseok Choe
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Inggeol Lee
- Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sunkyu Kim
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
- Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Justin Guinney
- Sage Bionetworks, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| |
Collapse
|
4
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
5
|
Warmerdam R, Lanting P, Deelen P, Franke L. Idéfix: identifying accidental sample mix-ups in biobanks using polygenic scores. Bioinformatics 2021; 38:1059-1066. [PMID: 34792549 PMCID: PMC8796367 DOI: 10.1093/bioinformatics/btab783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 10/07/2021] [Accepted: 11/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Identifying sample mix-ups in biobanks is essential to allow the repurposing of genetic data for clinical pharmacogenetics. Pharmacogenetic advice based on the genetic information of another individual is potentially harmful. Existing methods for identifying mix-ups are limited to datasets in which additional omics data (e.g. gene expression) is available. Cohorts lacking such data can only use sex, which can reveal only half of the mix-ups. Here, we describe Idéfix, a method for the identification of accidental sample mix-ups in biobanks using polygenic scores. RESULTS In the Lifelines population-based biobank, we calculated polygenic scores (PGSs) for 25 traits for 32 786 participants. We then applied Idéfix to compare the actual phenotypes to PGSs, and to use the relative discordance that is expected for mix-ups, compared to correct samples. In a simulation, using induced mix-ups, Idéfix reaches an AUC of 0.90 using 25 polygenic scores and sex. This is a substantial improvement over using only sex, which has an AUC of 0.75. Subsequent simulations present Idéfix's potential in varying datasets with more powerful PGSs. This suggests its performance will likely improve when more highly powered GWASs for commonly measured traits will become available. Idéfix can be used to identify a set of high-quality participants for whom it is very unlikely that they reflect sample mix-ups, and for these participants we can use genetic data for clinical purposes, such as pharmacogenetic profiles. For instance, in Lifelines, we can select 34.4% of participants, reducing the sample mix-up rate from 0.15% to 0.01%. AVAILABILITYAND IMPLEMENTATION Idéfix is freely available at https://github.com/molgenis/systemsgenetics/wiki/Idefix. The individual-level data that support the findings were obtained from the Lifelines biobank under project application number ov16_0365. Data is made available upon reasonable request submitted to the LifeLines Research office (research@lifelines.nl, https://www.lifelines.nl/researcher/how-to-apply/apply-here). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Robert Warmerdam
- Department of Genetics, University Medical Center Groningen, University of Groningen, 9700AB Groningen, The Netherlands
| | - Pauline Lanting
- Department of Genetics, University Medical Center Groningen, University of Groningen, 9700AB Groningen, The Netherlands
| | | | - Patrick Deelen
- Department of Genetics, University Medical Center Groningen, University of Groningen, 9700AB Groningen, The Netherlands,Department of Genetics, University Medical Center Utrecht, 3508GA Utrecht, The Netherlands
| | | |
Collapse
|
6
|
Douville NJ, Kheterpal S, Engoren M, Mathis M, Mashour GA, Hornsby WE, Willer CJ, Douville CB. Genetic mutations associated with susceptibility to perioperative complications in a longitudinal biorepository with integrated genomic and electronic health records. Br J Anaesth 2020; 125:986-994. [PMID: 32891412 DOI: 10.1016/j.bja.2020.08.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 07/06/2020] [Accepted: 08/05/2020] [Indexed: 10/23/2022] Open
Abstract
BACKGROUND Existing genetic information can be leveraged to identify patients with susceptibilities to conditions that might impact their perioperative care, but clinicians generally have limited exposure and are not trained to contextualise this information. We identified patients with genetic susceptibilities to anaesthetic complications using a perioperative biorepository and characterised the concordance with existing diagnoses. METHODS Adult patients undergoing surgery within Michigan Medicine from 2012 to 2017 were consented for genotyping. Genotypes were integrated with the electronic health record (EHR). We retrospectively characterised frequencies of variants associated with butyrylcholinesterase deficiency, factor V Leiden, and malignant hyperthermia, three pharmacogenetic factors with perioperative implications. We calculated the percentage homozygous and heterozygous for each that had been diagnosed previously and searched for EHR findings consistent with a predisposition. RESULTS Analysis of genetic data revealed that 25 out of 40 769 (0.1%) patients were homozygous and 1918 (4.7%) were heterozygous for mutations associated with butyrylcholinesterase deficiency. Of the homozygous individuals, 14 (56%) carried a pre-existing diagnosis. For factor V Leiden, 29 (0.1%) were homozygous and 2153 (5.3%) heterozygous. Of the homozygous individuals, three (10%) were diagnosed by EHR-derived phenotype and six (21%) by clinician review. Malignant hyperthermia was assessed in a subset of patients. We detected two patients with associated mutations. Neither carried clinical diagnoses. CONCLUSIONS We identified patients with genetic susceptibility to perioperative complications using an open source script designed for clinician use. We validated this application in a retrospective analysis for three conditions with well-characterised inheritance, and showed that not all genetic susceptibilities were documented in the EHR.
Collapse
Affiliation(s)
- Nicholas J Douville
- Department of Anesthesiology, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA.
| | - Sachin Kheterpal
- Department of Anesthesiology, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Milo Engoren
- Department of Anesthesiology, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Michael Mathis
- Department of Anesthesiology, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - George A Mashour
- Department of Anesthesiology, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Whitney E Hornsby
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA; Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Christopher B Douville
- Ludwig Center for Cancer Genetics and Therapeutics, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Sidney Kimmel Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
7
|
Bergquist T, Yan Y, Schaffter T, Yu T, Pejaver V, Hammarlund N, Prosser J, Guinney J, Mooney S. Piloting a model-to-data approach to enable predictive analytics in health care through patient mortality prediction. J Am Med Inform Assoc 2020; 27:1393-1400. [PMID: 32638010 PMCID: PMC7526463 DOI: 10.1093/jamia/ocaa083] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 04/16/2020] [Accepted: 05/06/2020] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE The development of predictive models for clinical application requires the availability of electronic health record (EHR) data, which is complicated by patient privacy concerns. We showcase the "Model to Data" (MTD) approach as a new mechanism to make private clinical data available for the development of predictive models. Under this framework, we eliminate researchers' direct interaction with patient data by delivering containerized models to the EHR data. MATERIALS AND METHODS We operationalize the MTD framework using the Synapse collaboration platform and an on-premises secure computing environment at the University of Washington hosting EHR data. Containerized mortality prediction models developed by a model developer, were delivered to the University of Washington via Synapse, where the models were trained and evaluated. Model performance metrics were returned to the model developer. RESULTS The model developer was able to develop 3 mortality prediction models under the MTD framework using simple demographic features (area under the receiver-operating characteristic curve [AUROC], 0.693), demographics and 5 common chronic diseases (AUROC, 0.861), and the 1000 most common features from the EHR's condition/procedure/drug domains (AUROC, 0.921). DISCUSSION We demonstrate the feasibility of the MTD framework to facilitate the development of predictive models on private EHR data, enabled by common data models and containerization software. We identify challenges that both the model developer and the health system information technology group encountered and propose future efforts to improve implementation. CONCLUSIONS The MTD framework lowers the barrier of access to EHR data and can accelerate the development and evaluation of clinical prediction models.
Collapse
Affiliation(s)
- Timothy Bergquist
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Yao Yan
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, Washington, USA
| | | | - Thomas Yu
- Sage Bionetworks, Seattle, Washington, USA
| | - Vikas Pejaver
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Noah Hammarlund
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Justin Prosser
- Institute for Translational Health Sciences, University of Washington, Seattle, Washington, USA
| | - Justin Guinney
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA.,Sage Bionetworks, Seattle, Washington, USA
| | - Sean Mooney
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| |
Collapse
|
8
|
Carraro M, Monzon AM, Chiricosta L, Reggiani F, Aspromonte MC, Bellini M, Pagel K, Jiang Y, Radivojac P, Kundu K, Pal LR, Yin Y, Limongelli I, Andreoletti G, Moult J, Wilson SJ, Katsonis P, Lichtarge O, Chen J, Wang Y, Hu Z, Brenner SE, Ferrari C, Murgia A, Tosatto SC, Leonardi E. Assessment of patient clinical descriptions and pathogenic variants from gene panel sequences in the CAGI-5 intellectual disability challenge. Hum Mutat 2019; 40:1330-1345. [PMID: 31144778 PMCID: PMC7341177 DOI: 10.1002/humu.23823] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Accepted: 05/27/2019] [Indexed: 12/15/2022]
Abstract
The Critical Assessment of Genome Interpretation-5 intellectual disability challenge asked to use computational methods to predict patient clinical phenotypes and the causal variant(s) based on an analysis of their gene panel sequence data. Sequence data for 74 genes associated with intellectual disability (ID) and/or autism spectrum disorders (ASD) from a cohort of 150 patients with a range of neurodevelopmental manifestations (i.e. ID, autism, epilepsy, microcephaly, macrocephaly, hypotonia, ataxia) have been made available for this challenge. For each patient, predictors had to report the causative variants and which of the seven phenotypes were present. Since neurodevelopmental disorders are characterized by strong comorbidity, tested individuals often present more than one pathological condition. Considering the overall clinical manifestation of each patient, the correct phenotype has been predicted by at least one group for 93 individuals (62%). ID and ASD were the best predicted among the seven phenotypic traits. Also, causative or potentially pathogenic variants were predicted correctly by at least one group. However, the prediction of the correct causative variant seems to be insufficient to predict the correct phenotype. In some cases, the correct prediction has been supported by rare or common variants in genes different from the causative one.
Collapse
Affiliation(s)
- Marco Carraro
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | | | - Luigi Chiricosta
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Francesco Reggiani
- Department of Biomedical Sciences, University of Padua, Padua, Italy
- Department of Information Engineering, University of Padua, Padua, Italy
| | | | - Mariagrazia Bellini
- Department of Woman and Child Health, University of Padua, Padua, Italy
- Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padova, Italy
| | - Kymberleigh Pagel
- Khoury College of Computer and Information Sciences, Northeastern University, 440, Huntington Avenue, Boston, MA 02115, USA
| | - Yuxiang Jiang
- Khoury College of Computer and Information Sciences, Northeastern University, 440, Huntington Avenue, Boston, MA 02115, USA
| | - Predrag Radivojac
- Khoury College of Computer and Information Sciences, Northeastern University, 440, Huntington Avenue, Boston, MA 02115, USA
| | - Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD 20742, USA
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD 20742, USA
| | | | - Gaia Andreoletti
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Stephen J. Wilson
- Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, TX 77030, USA
| | - Jingqi Chen
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Yaqiong Wang
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Carlo Ferrari
- Department of Information Engineering, University of Padua, Padua, Italy
| | - Alessandra Murgia
- Department of Woman and Child Health, University of Padua, Padua, Italy
- Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padova, Italy
| | - Silvio C.E. Tosatto
- Department of Biomedical Sciences, University of Padua, Padua, Italy
- CNR Institute of Neuroscience, Padua, Italy
| | - Emanuela Leonardi
- Department of Woman and Child Health, University of Padua, Padua, Italy
- Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padova, Italy
| |
Collapse
|
9
|
Kasak L, Hunter JM, Udani R, Bakolitsa C, Hu Z, Adhikari AN, Babbi G, Casadio R, Gough J, Guerrero RF, Jiang Y, Joseph T, Katsonis P, Kotte S, Kundu K, Lichtarge O, Martelli PL, Mooney SD, Moult J, Pal LR, Poitras J, Radivojac P, Rao A, Sivadasan N, Sunderam U, VG S, Yin Y, Zaucha J, Brenner SE, Meyn MS. CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases. Hum Mutat 2019; 40:1373-1391. [PMID: 31322791 PMCID: PMC7318886 DOI: 10.1002/humu.23874] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 07/15/2019] [Accepted: 07/15/2019] [Indexed: 01/02/2023]
Abstract
Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.
Collapse
Affiliation(s)
- Laura Kasak
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Jesse M. Hunter
- Department of Pediatrics and Wisconsin State Lab of Hygiene, University of Wisconsin Madison, WI, USA
| | - Rupa Udani
- Department of Pediatrics and Wisconsin State Lab of Hygiene, University of Wisconsin Madison, WI, USA
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Aashish N. Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Julian Gough
- Department of Computer Science, University of Bristol, Bristol, UK
| | | | - Yuxiang Jiang
- Department of Computer Science, Indiana University, IN, USA
| | | | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Biochemistry & Molecular Biology, Department of Pharmacology, Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, WA, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, MD, USA
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, MA, USA
| | | | | | | | | | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD, USA
| | - Jan Zaucha
- Department of Computer Science, University of Bristol, Bristol, UK
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - M. Stephen Meyn
- Center for Human Genomics and Precision Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
- Department of Paediatrics, The Hospital for Sick Children, Toronto, Canada
| |
Collapse
|
10
|
Hoskins RA, Repo S, Barsky D, Andreoletti G, Moult J, Brenner SE. Reports from CAGI: The Critical Assessment of Genome Interpretation. Hum Mutat 2017; 38:1039-1041. [PMID: 28817245 PMCID: PMC5606199 DOI: 10.1002/humu.23290] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 07/08/2017] [Indexed: 12/20/2022]
Affiliation(s)
- Roger A Hoskins
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Susanna Repo
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Daniel Barsky
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|