1
|
Shyr C, Hu Y, Bastarache L, Cheng A, Hamid R, Harris P, Xu H. Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:438-461. [PMID: 38681753 PMCID: PMC11052982 DOI: 10.1007/s41666-023-00155-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/24/2023] [Accepted: 11/13/2023] [Indexed: 05/01/2024]
Abstract
Purpose Phenotyping is critical for informing rare disease diagnosis and treatment, but disease phenotypes are often embedded in unstructured text. While natural language processing (NLP) can automate extraction, a major bottleneck is developing annotated corpora. Recently, prompt learning with large language models (LLMs) has been shown to lead to generalizable results without any (zero-shot) or few annotated samples (few-shot), but none have explored this for rare diseases. Our work is the first to study prompt learning for identifying and extracting rare disease phenotypes in the zero- and few-shot settings. Methods We compared the performance of prompt learning with ChatGPT and fine-tuning with BioClinicalBERT. We engineered novel prompts for ChatGPT to identify and extract rare diseases and their phenotypes (e.g., diseases, symptoms, and signs), established a benchmark for evaluating its performance, and conducted an in-depth error analysis. Results Overall, fine-tuning BioClinicalBERT resulted in higher performance (F1 of 0.689) than ChatGPT (F1 of 0.472 and 0.610 in the zero- and few-shot settings, respectively). However, ChatGPT achieved higher accuracy for rare diseases and signs in the one-shot setting (F1 of 0.778 and 0.725). Conversational, sentence-based prompts generally achieved higher accuracy than structured lists. Conclusion Prompt learning using ChatGPT has the potential to match or outperform fine-tuning BioClinicalBERT at extracting rare diseases and signs with just one annotated sample. Given its accessibility, ChatGPT could be leveraged to extract these entities without relying on a large, annotated corpus. While LLMs can support rare disease phenotyping, researchers should critically evaluate model outputs to ensure phenotyping accuracy.
Collapse
Affiliation(s)
- Cathy Shyr
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
| | - Yan Hu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77225 USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
| | - Alex Cheng
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
| | - Rizwan Hamid
- Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, TN 37203 USA
| | - Paul Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203 USA
- Department of Biomedical Engineering, Vanderbilt University Medical Center, 2525 West End Avenue, Nashville, TN 37203 USA
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, 100 College Street, New Haven, CT 06510 USA
| |
Collapse
|
2
|
Curic E, Ewans L, Pysar R, Taylan F, Botto LD, Nordgren A, Gahl W, Palmer EE. International Undiagnosed Diseases Programs (UDPs): components and outcomes. Orphanet J Rare Dis 2023; 18:348. [PMID: 37946247 PMCID: PMC10633944 DOI: 10.1186/s13023-023-02966-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 10/30/2023] [Indexed: 11/12/2023] Open
Abstract
Over the last 15 years, Undiagnosed Diseases Programs have emerged to address the significant number of individuals with suspected but undiagnosed rare genetic diseases, integrating research and clinical care to optimize diagnostic outcomes. This narrative review summarizes the published literature surrounding Undiagnosed Diseases Programs worldwide, including thirteen studies that evaluate outcomes and two commentary papers. Commonalities in the diagnostic and research process of Undiagnosed Diseases Programs are explored through an appraisal of available literature. This exploration allowed for an assessment of the strengths and limitations of each of the six common steps, namely enrollment, comprehensive clinical phenotyping, research diagnostics, data sharing and matchmaking, results, and follow-up. Current literature highlights the potential utility of Undiagnosed Diseases Programs in research diagnostics. Since participants have often had extensive previous genetic studies, research pipelines allow for diagnostic approaches beyond exome or whole genome sequencing, through reanalysis using research-grade bioinformatics tools and multi-omics technologies. The overall diagnostic yield is presented by study, since different selection criteria at enrollment and reporting processes make comparisons challenging and not particularly informative. Nonetheless, diagnostic yield in an undiagnosed cohort reflects the potential of an Undiagnosed Diseases Program. Further comparisons and exploration of the outcomes of Undiagnosed Diseases Programs worldwide will allow for the development and improvement of the diagnostic and research process and in turn improve the value and utility of an Undiagnosed Diseases Program.
Collapse
Affiliation(s)
- Ela Curic
- Discipline of Paediatrics and Child Health, Faculty of Medicine and Health, School of Clinical Medicine, University of New South Wales, Bright Alliance Building, Level 8, Randwick, NSW, Australia
| | - Lisa Ewans
- Discipline of Paediatrics and Child Health, Faculty of Medicine and Health, School of Clinical Medicine, University of New South Wales, Bright Alliance Building, Level 8, Randwick, NSW, Australia
- Centre for Clinical Genetics, Sydney Children's Hospital, Randwick, NSW, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Ryan Pysar
- Discipline of Paediatrics and Child Health, Faculty of Medicine and Health, School of Clinical Medicine, University of New South Wales, Bright Alliance Building, Level 8, Randwick, NSW, Australia
- Centre for Clinical Genetics, Sydney Children's Hospital, Randwick, NSW, Australia
- Department of Clinical Genetics, The Children's Hospital at Westmead, Westmead, NSW, Australia
| | - Fulya Taylan
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm, Sweden
| | - Lorenzo D Botto
- Division of Medical Genetics, Department of Pediatrics, University of Utah, Salt Lake City, Utah, USA
| | - Ann Nordgren
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm, Sweden
- Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
- Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - William Gahl
- Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Elizabeth Emma Palmer
- Discipline of Paediatrics and Child Health, Faculty of Medicine and Health, School of Clinical Medicine, University of New South Wales, Bright Alliance Building, Level 8, Randwick, NSW, Australia.
- Centre for Clinical Genetics, Sydney Children's Hospital, Randwick, NSW, Australia.
| |
Collapse
|
3
|
Montano C, Cassini T, Ziegler SG, Boehm M, Nicoli ER, Mindell JA, Soldatos AG, Manoli I, Wolfe L, Macnamara EF, Malicdan MCV, Adams DR, Tifft CJ, Toro C, Gahl WA. Diagnosis and discovery: Insights from the NIH Undiagnosed Diseases Program. J Inherit Metab Dis 2022; 45:907-918. [PMID: 35490291 DOI: 10.1002/jimd.12506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/14/2022] [Accepted: 04/27/2022] [Indexed: 11/11/2022]
Abstract
Living with an undiagnosed medical condition places a tremendous burden on patients, their families, and their healthcare providers. The Undiagnosed Diseases Program (UDP) was established at the National Institutes of Health (NIH) in 2008 with the primary goals of providing a diagnosis for patients with mysterious conditions and advancing medical knowledge about rare and common diseases. The program reviews applications from referring clinicians for cases that are considered undiagnosed despite a thorough evaluation. Those that are accepted receive clinical evaluations involving deep phenotyping and genetic testing that includes exome and genomic sequencing. Selected candidate gene variants are evaluated by collaborators using functional assays. Since its inception, the UDP has received more than 4500 applications and has completed evaluations on nearly 1300 individuals. Here we present six cases that exemplify the discovery of novel disease mechanisms, the importance of deep phenotyping for rare diseases, and how genetic diagnoses have led to appropriate treatment. The creation of the Undiagnosed Diseases Network (UDN) in 2014 has substantially increased the number of patients evaluated and allowed for greater opportunities for data sharing. Expansion to the Undiagnosed Diseases Network International (UDNI) has the possibility to extend this reach even farther. Together, networks of undiagnosed diseases programs are powerful tools to advance our knowledge of pathophysiology, accelerate accurate diagnoses, and improve patient care for patients with rare conditions.
Collapse
Affiliation(s)
- Carolina Montano
- Medical Genetics & Genomic Medicine Training Program, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
| | - Thomas Cassini
- Medical Genetics & Genomic Medicine Training Program, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
| | - Shira G Ziegler
- Departments of Pediatrics and Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Manfred Boehm
- Laboratory of Cardiovascular Regenerative Medicine, National Heart, Lung, and Blood Institute (NHLBI), NIH, Bethesda, Maryland, USA
| | - Elena-Raluca Nicoli
- Glycosphingolipid and Glycoprotein Disorders Unit, Medical Genetics Branch, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
| | - Joseph A Mindell
- Membrane Transport Biophysics Section, National Institute of Neurological Disorders and Stroke (NINDS), NIH, Bethesda, Maryland, USA
| | - Ariane G Soldatos
- Office of the Clinical Director, National Institute of Neurological Disorders and Stroke (NINDS), NIH, Bethesda, Maryland, USA
| | - Irini Manoli
- Organic Acid Research Section, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
| | - Lynne Wolfe
- Office of the Clinical Director, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
| | - Ellen F Macnamara
- NIH Undiagnosed Diseases Program, Common Fund, NIH, Bethesda, Maryland, USA
| | | | - David R Adams
- Office of the Clinical Director, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
- NIH Undiagnosed Diseases Program, Common Fund, NIH, Bethesda, Maryland, USA
| | - Cynthia J Tifft
- Glycosphingolipid and Glycoprotein Disorders Unit, Medical Genetics Branch, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
- Office of the Clinical Director, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
- NIH Undiagnosed Diseases Program, Common Fund, NIH, Bethesda, Maryland, USA
| | - Camilo Toro
- NIH Undiagnosed Diseases Program, Common Fund, NIH, Bethesda, Maryland, USA
| | - William A Gahl
- Office of the Clinical Director, National Human Genome Research Institute (NHGRI), NIH, Bethesda, Maryland, USA
- NIH Undiagnosed Diseases Program, Common Fund, NIH, Bethesda, Maryland, USA
| |
Collapse
|
4
|
Xiao C, Koziura M, Cope H, Spillman R, Tan K, Hisama FM, Tifft CJ, Toro C. Adults with lysosomal storage diseases in the undiagnosed diseases network. Mol Genet Genomic Med 2022; 10:e2013. [PMID: 35848209 PMCID: PMC9482386 DOI: 10.1002/mgg3.2013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 05/17/2022] [Accepted: 06/15/2022] [Indexed: 12/24/2022] Open
Abstract
OBJECTIVES To review the referral and clinical characteristics of adult patients diagnosed with lysosomal storage diseases (LSD) through the Undiagnosed Diseases Network (UDN). METHODS Retrospective review of both application and evaluation records for adults admitted to the UDN with a final diagnosis of a lysosomal storage disease. RESULTS Ten patients were identified. Final diagnoses included late onset Tay Sachs, attenuated MPS I, MPS IIIA, MPS IIIB, and MPS IIIC. Most patients presented with neurocognitive changes. Prior to referral, all patients had been evaluated by neurology, four patients underwent phenotype specific panel testing that did not include the causative gene, and four patients had non-diagnostic clinical exome sequencing. CONCLUSIONS LSDs figure highly in the differential diagnosis of neurometabolic disorders in pediatric onset progressive diseases. In adults, their subtle initial presentations overlap with symptoms of more common disorders and less practitioner awareness may lead to prolonged diagnostic challenges.
Collapse
Affiliation(s)
- Changrui Xiao
- National Human Genome Research InstituteBethesdaMarylandUSA
| | - Mary Koziura
- Department of PediatricsDivision of Medical Genetics and Genomic MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Heidi Cope
- Department of Pediatrics, Medical GeneticsDuke University Medical CenterDurhamNorth CarolinaUSA
| | - Rebecca Spillman
- Department of Pediatrics, Medical GeneticsDuke University Medical CenterDurhamNorth CarolinaUSA
| | - Khoon Tan
- Department of Pediatrics, Medical GeneticsDuke University Medical CenterDurhamNorth CarolinaUSA
| | - Fuki M. Hisama
- Department of MedicineDivision of Medical GeneticsUniversity of Washington School of MedicineSeattleWashingtonUSA
| | | | - Camilo Toro
- National Human Genome Research InstituteBethesdaMarylandUSA
| |
Collapse
|
5
|
Boycott KM, Azzariti DR, Hamosh A, Rehm HL. Seven years since the launch of the Matchmaker Exchange: The evolution of genomic matchmaking. Hum Mutat 2022; 43:659-667. [PMID: 35537081 PMCID: PMC9133175 DOI: 10.1002/humu.24373] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 03/22/2022] [Indexed: 11/09/2022]
Abstract
The Matchmaker Exchange (MME) was launched in 2015 to provide a robust mechanism to discover novel disease-gene relationships. It operates as a federated network connecting databases holding relevant data using a common application programming interface, where two or more users are looking for a match for the same gene (two-sided matchmaking). Seven years from its launch, it is clear that the MME is making outstanding contributions to understanding the morbid anatomy of the genome. The number of unique genes present across the MME has steadily increased over time; there are currently >13,520 unique genes (~68% of all protein-coding genes) connected across the MME's eight genomic matchmaking nodes, GeneMatcher, DECIPHER, PhenomeCentral, MyGene2, seqr, Initiative on Rare and Undiagnosed Disease, PatientMatcher, and the RD-Connect Genome-Phenome Analysis Platform. The collective data set accessible across the MME currently includes more than 120,000 cases from over 12,000 contributors in 98 countries. The discovery of potential new disease-gene relationships is happening daily and international collaborative teams are moving these advances forward to publication, now numbering well over 500. Expansion of data sharing into routine clinical practice by clinicians, genetic counselors, and clinical laboratories has ensured access to discovery for even more individuals with undiagnosed rare genetic diseases. Tens of thousands of patients and their family members have been directly or indirectly impacted by the discoveries facilitated by two-sided genomic matchmaking. MME supports further connections to the literature (PubCaseFinder) and to human and model organism resources (Monarch Initiative) and scientists (ModelMatcher). Efforts are now underway to explore additional approaches to matchmaking at the gene or variant level where there is only one querier (one-sided matchmaking). Genomic matchmaking has proven its utility over the past 7 years and will continue to facilitate discoveries in the years to come.
Collapse
Affiliation(s)
- Kym M. Boycott
- Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Danielle R. Azzariti
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Ada Hamosh
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| |
Collapse
|
6
|
Chao S, Lotfi J, Lin B, Shaw J, Jhandi S, Mahoney M, Singh B, Nguyen L, Halawi H, Geng LN. Diagnostic journeys: characterization of patients and diagnostic outcomes from an academic second opinion clinic. Diagnosis (Berl) 2022; 9:340-347. [PMID: 35596123 DOI: 10.1515/dx-2022-0029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/19/2022] [Indexed: 11/15/2022]
Abstract
OBJECTIVES Diagnostic programs and second opinion clinics have grown and evolved in the recent years to help patients with rare, puzzling, and complex conditions who often suffer prolonged diagnostic journeys, but there is a paucity of literature on the clinical characteristics of these patients and the efficacy of these diagnostic programs. This study aims to characterize the diagnostic journey, case features, and diagnostic outcomes of patients referred to a team-based second opinion clinic at Stanford. METHODS Retrospective chart review was performed for 237 patients evaluated for diagnostic second opinion in the Stanford Consultative Medicine Clinic over a 5 year period. Descriptive case features and diagnostic outcomes were assessed, and correlation between the two was analyzed. RESULTS Sixty-three percent of our patients were women. 49% of patients had a potential precipitating event within about a month prior to the start of their illness, such as medication change, infection, or medical procedure. A single clear diagnosis was determined in 33% of cases, whereas the remaining cases were assessed to have multifactorial contributors/diagnoses (20%) or remained unclear despite extensive evaluation (47%). Shorter duration of illness, fewer prior specialties seen, and single chief symptom were associated with higher likelihood of achieving a single clear diagnosis. CONCLUSIONS A single-site academic consultative service can offer additional diagnostic insights for about half of all patients evaluated for puzzling conditions. Better understanding of the clinical patterns and patient experiences gained from this study helps inform strategies to shorten their diagnostic odysseys.
Collapse
|
7
|
Fujiwara T, Shin JM, Yamaguchi A. Advances in the development of PubCaseFinder, including the new application programming interface and matching algorithm. Hum Mutat 2022; 43:734-742. [PMID: 35143083 PMCID: PMC9305291 DOI: 10.1002/humu.24341] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 01/17/2022] [Accepted: 02/07/2022] [Indexed: 11/11/2022]
Abstract
Over 10,000 rare genetic diseases have been identified, and millions of newborns are affected by severe rare genetic diseases each year. A variety of Human Phenotype Ontology (HPO)-based clinical decision support systems (CDSS) and patient repositories have been developed to support clinicians in diagnosing patients with suspected rare genetic diseases. In September 2017, we released PubCaseFinder (https://pubcasefinder.dbcls.jp), a web-based CDSS that provides ranked lists of genetic and rare diseases using HPO-based phenotypic similarities, where top-listed diseases represent the most likely differential diagnosis. We also developed a Matchmaker Exchange (MME) application programming interface (API) to query PubCaseFinder, which has been adopted by several patient repositories. In this paper, we describe notable updates regarding PubCaseFinder, the GeneYenta matching algorithm implemented in PubCaseFinder, and the PubCaseFinder API. The updated GeneYenta matching algorithm improves the performance of the CDSS automated differential diagnosis function. Moreover, the updated PubCaseFinder and new API empower patient repositories participating in MME and medical professionals to actively use HPO-based resources. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Toyofumi Fujiwara
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa-shi, Chiba-ken, 277-0871, Japan
| | - Jae-Moon Shin
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa-shi, Chiba-ken, 277-0871, Japan
| | - Atsuko Yamaguchi
- Graduate School of Integrative Science and Engineering, Tokyo City University, Setagaya-ku, Tokyo, 158-8557, Japan
| |
Collapse
|
8
|
Seaby EG, Rehm HL, O’Donnell-Luria A. Strategies to Uplift Novel Mendelian Gene Discovery for Improved Clinical Outcomes. Front Genet 2021; 12:674295. [PMID: 34220947 PMCID: PMC8248347 DOI: 10.3389/fgene.2021.674295] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/12/2021] [Indexed: 01/31/2023] Open
Abstract
Rare genetic disorders, while individually rare, are collectively common. They represent some of the most severe disorders affecting patients worldwide with significant morbidity and mortality. Over the last decade, advances in genomic methods have significantly uplifted diagnostic rates for patients and facilitated novel and targeted therapies. However, many patients with rare genetic disorders still remain undiagnosed as the genetic etiology of only a proportion of Mendelian conditions has been discovered to date. This article explores existing strategies to identify novel Mendelian genes and how these discoveries impact clinical care and therapeutics. We discuss the importance of data sharing, phenotype-driven approaches, patient-led approaches, utilization of large-scale genomic sequencing projects, constraint-based methods, integration of multi-omics data, and gene-to-patient methods. We further consider the health economic advantages of novel gene discovery and speculate on potential future methods for improved clinical outcomes.
Collapse
Affiliation(s)
- Eleanor G. Seaby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Genomic Informatics Group, University Hospital Southampton, Southampton, United Kingdom
- Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, United States
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, United States
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, United States
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, United States
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, United States
- Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA, United States
| |
Collapse
|
9
|
Kobren SN, Baldridge D, Velinder M, Krier JB, LeBlanc K, Esteves C, Pusey BN, Züchner S, Blue E, Lee H, Huang A, Bastarache L, Bican A, Cogan J, Marwaha S, Alkelai A, Murdock DR, Liu P, Wegner DJ, Paul AJ, Sunyaev SR, Kohane IS. Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases. Genet Med 2021; 23:1075-1085. [PMID: 33580225 PMCID: PMC8187147 DOI: 10.1038/s41436-020-01084-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 12/14/2020] [Accepted: 12/17/2020] [Indexed: 12/31/2022] Open
Abstract
PURPOSE Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful. METHODS We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols. RESULTS We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases. CONCLUSION The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases.
Collapse
Affiliation(s)
| | - Dustin Baldridge
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA
| | - Matt Velinder
- Center for Genomic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Joel B Krier
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Kimberly LeBlanc
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Cecilia Esteves
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Barbara N Pusey
- National Human Genome Research Institute (NHGRI) at the National Institutes of Health (NIH), Bethesda, MD, USA
| | - Stephan Züchner
- Department of Human Genetics and Hussman Institute for Human Genomics, University of Miami Health System, Miami, FL, USA
| | - Elizabeth Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Hane Lee
- Department of Human Genetics, David Geffen School of Medicine at the University of California, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at the University of California, Los Angeles, CA, USA
| | - Alden Huang
- Department of Human Genetics, David Geffen School of Medicine at the University of California, Los Angeles, CA, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Anna Bican
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joy Cogan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Shruti Marwaha
- Stanford Center for Undiagnosed Diseases, Stanford, CA, USA
| | - Anna Alkelai
- Institute for Genomic Medicine, Columbia University Medical Center, New York City, NY, USA
| | - David R Murdock
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Pengfei Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Baylor Genetics, Houston, TX, USA
| | - Daniel J Wegner
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA
| | - Alexander J Paul
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|