1
|
Kim J, Wang K, Weng C, Liu C. Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease. Am J Hum Genet 2024; 111:2190-2202. [PMID: 39255797 DOI: 10.1016/j.ajhg.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/08/2024] [Accepted: 08/13/2024] [Indexed: 09/12/2024] Open
Abstract
Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.
Collapse
Affiliation(s)
- Junyoung Kim
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
2
|
Badwal AK, Singh S. A comprehensive review on the current status of CRISPR based clinical trials for rare diseases. Int J Biol Macromol 2024; 277:134097. [PMID: 39059527 DOI: 10.1016/j.ijbiomac.2024.134097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 07/03/2024] [Accepted: 07/20/2024] [Indexed: 07/28/2024]
Abstract
A considerable fraction of population in the world suffers from rare diseases. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and its related Cas proteins offer a modern form of curative gene therapy for treating the rare diseases. Hereditary transthyretin amyloidosis, hereditary angioedema, duchenne muscular dystrophy and Rett syndrome are a few examples of such rare diseases. CRISPR/Cas9, for example, has been used in the treatment of β-thalassemia and sickle cell disease (Frangoul et al., 2021; Pavani et al., 2021) [1,2]. Neurological diseases such as Huntington's have also been focused in some studies involving CRISPR/Cas (Yang et al., 2017; Yan et al., 2023) [3,4]. Delivery of these biologicals via vector and non vector mediated methods depends on the type of target cells, characteristics of expression, time duration of expression, size of foreign genetic material etc. For instance, retroviruses find their applicability in case of ex vivo delivery in somatic cells due to their ability to integrate in the host genome. These have been successfully used in gene therapy involving X-SCID patients although, incidence of inappropriate activation has been reported. On the other hand, ex vivo gene therapy for β-thalassemia involved use of BB305 lentiviral vector for high level expression of CRISPR biological in HSCs. The efficacy and safety of these biologicals will decide their future application as efficient genome editing tools as they go forward in further stages of human clinical trials. This review focuses on CRISPR/Cas based therapies which are at various stages of clinical trials for treatment of rare diseases and the constraints and ethical issues associated with them.
Collapse
Affiliation(s)
- Amneet Kaur Badwal
- Department of Biotechnology, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Mohali 160062, Punjab, India
| | - Sushma Singh
- Department of Biotechnology, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Mohali 160062, Punjab, India.
| |
Collapse
|
3
|
van Karnebeek CDM, O'Donnell-Luria A, Baynam G, Baudot A, Groza T, Jans JJM, Lassmann T, Letinturier MCV, Montgomery SB, Robinson PN, Sansen S, Mehrian-Shai R, Steward C, Kosaki K, Durao P, Sadikovic B. Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases. Orphanet J Rare Dis 2024; 19:357. [PMID: 39334316 PMCID: PMC11438178 DOI: 10.1186/s13023-024-03361-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 09/11/2024] [Indexed: 09/30/2024] Open
Abstract
Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.
Collapse
Affiliation(s)
- Clara D M van Karnebeek
- Departments of Pediatrics and Human Genetics, Emma Center for Personalized Medicine, Amsterdam Gastro-Enterology Endocrinology Metabolism, Amsterdam University Medical Centers, Amsterdam, The Netherlands.
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, USA
| | - Gareth Baynam
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital and Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Perth, Australia
- European Molecular Biology Laboratory (EMBL-EBI), European Bioinformatics Institute, Hinxton, UK
| | - Judith J M Jans
- Department of Genetics, Section Metabolic Diagnostics, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | | | | | | | - Ruty Mehrian-Shai
- Pediatric Brain Cancer Molecular Lab, Sheba Medical Center, Ramat Gan, Israel
| | | | | | - Patricia Durao
- The Cure and Action for Tay-Sachs (CATS) Foundation, Altringham, UK
| | - Bekim Sadikovic
- Verspeeten Clinical Genome Centre, London Health Sciences, London, Canada
- Department of Pathology and Laboratory Medicine, Western University, London, Canada
| |
Collapse
|
4
|
Swietlik EM, Fay M, Morrell NW. Exploring Diagnostic and Therapeutic Odyssey in Pulmonary Arterial Hypertension: Insights from In-Depth Semi-Structured Interviews. Respiration 2024:1-14. [PMID: 39250896 DOI: 10.1159/000540556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Accepted: 07/20/2024] [Indexed: 09/11/2024] Open
Abstract
INTRODUCTION Establishing a diagnosis is paramount in medical practice as it shapes patients' experiences and guides treatment. Patients grappling with rare diseases face a triple challenge: prolonged diagnostic journeys, limited responses to existing therapies, and the absence of effective monitoring tools. Genetic diagnosis often provides crucial diagnostic and prognostic information, opening up possibilities for genotype-targeted treatments and facilitating counselling and relative testing. The NIHR BioResource - Rare Diseases (NBR) Study and the Cohort Study in Idiopathic and Hereditary Pulmonary Arterial Hypertension (PAH Cohort study) aimed to enhance diagnosis and treatment for PAH, successfully identifying the genetic cause in 25% of idiopathic cases. However, the diagnostic and therapeutic odyssey in patients with PAH remains largely unexplored. METHODS Stakeholders from the NBR and PAH Cohort studies were recruited using purposive sampling. In-depth interviews and focus groups were recorded, transcribed, anonymised, and analysed thematically using MAXQDA software. RESULTS The study involved 53 interviews and focus groups with 63 participants, revealing key themes across five stages of the diagnostic odyssey: initial health concerns and interactions with general practitioners, experiences of misdiagnosis, relief upon receiving the correct diagnosis, and mixed emotions regarding genetic results and the challenges of living with the disease. Following the diagnosis, participants embarked on a therapeutic journey, facing various challenges, including the disease's impact on professional and social lives, the learning curve associated with understanding the disease, shifts in communication dynamics with healthcare providers, therapeutic hurdles, and insurance-related issues. Building on these insights, we identified areas of unmet needs, such as improved collaboration with primary care providers and local hospitals, the provision of psychological support and counselling, and the necessity for ongoing patient education in the ever-evolving realms of research and therapy. CONCLUSIONS The study highlights the significant challenges encountered throughout the diagnostic and therapeutic journey in PAH. To enhance patient outcomes, it is crucial to raise awareness of the disease, establish clear diagnostic pathways, and seamlessly integrate genetic diagnostics into clinical practice. Streamlining the diagnostic process can be achieved by utilising existing clinical infrastructure to support research and fostering better communication within the NHS. Moreover, there is an urgent need for more effective therapies alongside less burdensome drug delivery methods.
Collapse
Affiliation(s)
- Emilia M Swietlik
- Department of Medicine, The Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Department of Pulmonology, Collegium Medicum, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
- Respiratory Medicine Department, Addenbrooke's Hospital, Cambridge, UK
| | | | - Nicholas W Morrell
- Department of Medicine, The Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| |
Collapse
|
5
|
Flaharty KA, Hu P, Hanchard SL, Ripper ME, Duong D, Waikel RL, Solomon BD. Evaluating large language models on medical, lay-language, and self-reported descriptions of genetic conditions. Am J Hum Genet 2024; 111:1819-1833. [PMID: 39146935 PMCID: PMC11393706 DOI: 10.1016/j.ajhg.2024.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 07/15/2024] [Accepted: 07/16/2024] [Indexed: 08/17/2024] Open
Abstract
Large language models (LLMs) are generating interest in medical settings. For example, LLMs can respond coherently to medical queries by providing plausible differential diagnoses based on clinical notes. However, there are many questions to explore, such as evaluating differences between open- and closed-source LLMs as well as LLM performance on queries from both medical and non-medical users. In this study, we assessed multiple LLMs, including Llama-2-chat, Vicuna, Medllama2, Bard/Gemini, Claude, ChatGPT3.5, and ChatGPT-4, as well as non-LLM approaches (Google search and Phenomizer) regarding their ability to identify genetic conditions from textbook-like clinician questions and their corresponding layperson translations related to 63 genetic conditions. For open-source LLMs, larger models were more accurate than smaller LLMs: 7b, 13b, and larger than 33b parameter models obtained accuracy ranges from 21%-49%, 41%-51%, and 54%-68%, respectively. Closed-source LLMs outperformed open-source LLMs, with ChatGPT-4 performing best (89%-90%). Three of 11 LLMs and Google search had significant performance gaps between clinician and layperson prompts. We also evaluated how in-context prompting and keyword removal affected open-source LLM performance. Models were provided with 2 types of in-context prompts: list-type prompts, which improved LLM performance, and definition-type prompts, which did not. We further analyzed removal of rare terms from descriptions, which decreased accuracy for 5 of 7 evaluated LLMs. Finally, we observed much lower performance with real individuals' descriptions; LLMs answered these questions with a maximum 21% accuracy.
Collapse
Affiliation(s)
- Kendall A Flaharty
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA.
| | - Ping Hu
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Suzanna Ledgister Hanchard
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Molly E Ripper
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Dat Duong
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Rebekah L Waikel
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Benjamin D Solomon
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA.
| |
Collapse
|
6
|
Wang A, Liu C, Yang J, Weng C. Fine-tuning large language models for rare disease concept normalization. J Am Med Inform Assoc 2024; 31:2076-2083. [PMID: 38829731 PMCID: PMC11339522 DOI: 10.1093/jamia/ocae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 05/20/2024] [Accepted: 05/22/2024] [Indexed: 06/05/2024] Open
Abstract
OBJECTIVE We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). METHODS We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama 2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. RESULTS When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ∼20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. CONCLUSION Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLMs to identify named medical entities from clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Collapse
Affiliation(s)
- Andy Wang
- Peddie School, Hightstown, NJ 08520, United States
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Jingye Yang
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| |
Collapse
|
7
|
Tammen I, Mather M, Leeb T, Nicholas FW. Online Mendelian Inheritance in Animals (OMIA): a genetic resource for vertebrate animals. Mamm Genome 2024:10.1007/s00335-024-10059-y. [PMID: 39143381 DOI: 10.1007/s00335-024-10059-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 08/01/2024] [Indexed: 08/16/2024]
Abstract
Online Mendelian Inheritance in Animals (OMIA) is a freely available curated knowledgebase that contains information and facilitates research on inherited traits and diseases in animals. For the past 29 years, OMIA has been used by animal geneticists, breeders, and veterinarians worldwide as a definitive source of information. Recent increases in curation capacity and funding for software engineering support have resulted in software upgrades and commencement of several initiatives, which include the enhancement of variant information and links to human data resources, and the introduction of ontology-based breed information and categories. We provide an overview of current information and recent enhancements to OMIA and discuss how we are expanding the integration of OMIA into other resources and databases via the use of ontologies and the adaptation of tools used in human genetics.
Collapse
Affiliation(s)
- Imke Tammen
- Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, 2006, Australia.
| | - Marius Mather
- Sydney Informatics Hub, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, 3001, Switzerland
| | - Frank W Nicholas
- Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, 2006, Australia
| |
Collapse
|
8
|
Greene D, Thys C, Berry IR, Jarvis J, Ortibus E, Mumford AD, Freson K, Turro E. Mutations in the U4 snRNA gene RNU4-2 cause one of the most prevalent monogenic neurodevelopmental disorders. Nat Med 2024; 30:2165-2169. [PMID: 38821540 PMCID: PMC11333284 DOI: 10.1038/s41591-024-03085-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 05/23/2024] [Indexed: 06/02/2024]
Abstract
Most people with intellectual disability (ID) do not receive a molecular diagnosis following genetic testing. To identify new etiologies of ID, we performed a genetic association analysis comparing the burden of rare variants in 41,132 noncoding genes between 5,529 unrelated cases and 46,401 unrelated controls. RNU4-2, which encodes U4 small nuclear RNA, a critical component of the spliceosome, was the most strongly associated gene. We implicated de novo variants among 47 cases in two regions of RNU4-2 in the etiology of a syndrome characterized by ID, microcephaly, short stature, hypotonia, seizures and motor delay. We replicated this finding in three collections, bringing the number of unrelated cases to 73. Analysis of national genomic diagnostic data showed RNU4-2 to be a more common etiological gene for neurodevelopmental abnormality than any previously reported autosomal gene. Our findings add to the growing evidence of spliceosome dysfunction in the etiologies of neurological disorders.
Collapse
Affiliation(s)
- Daniel Greene
- Department of Medicine, University of Cambridge, Cambridge, UK
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chantal Thys
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KU Leuven, Leuven, Belgium
| | - Ian R Berry
- NHS South West Genomic Laboratory Hub, Southmead Hospital, Bristol, UK
- NHS South West Genomic Medicine Service Alliance, Bristol, UK
| | - Joanna Jarvis
- Clinical Genetics Unit, Birmingham Women's Hospital, Birmingham, UK
| | - Els Ortibus
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Paediatric Neurology Department, University Hospitals of KU Leuven, Leuven, Belgium
| | - Andrew D Mumford
- NHS South West Genomic Medicine Service Alliance, Bristol, UK
- Bristol Medical School, University of Bristol, Bristol, UK
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KU Leuven, Leuven, Belgium
| | - Ernest Turro
- Department of Medicine, University of Cambridge, Cambridge, UK.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
9
|
Gresky J, Frotscher M, Dorn J, Scheelen-Nováček K, Ahlbrecht Y, Jakob T, Schönbuchner T, Canalejo J, Ducke B, Petiti E. The Digital Atlas of Ancient Rare Diseases (DAARD) and its relevance for current research. Orphanet J Rare Dis 2024; 19:277. [PMID: 39044201 PMCID: PMC11267669 DOI: 10.1186/s13023-024-03280-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 07/03/2024] [Indexed: 07/25/2024] Open
Abstract
BACKGROUND The history of rare diseases is largely unknown. Research on this topic has focused on individual cases of prominent (historical) individuals and artistic (e.g., iconographic) representations. Medical collections include large numbers of specimens that exhibit signs of rare diseases, but most of them date to relatively recent periods. However, cases of rare diseases detected in mummies and skeletal remains derived from archaeological excavations have also been recorded. Nevertheless, this direct evidence from historical and archaeological contexts is mainly absent from academic discourse and generally not consulted in medical research on rare diseases. RESULTS This desideratum is addressed by the Digital Atlas of Ancient Rare Diseases (DAARD: https://daard.dainst.org ), which is an open access/open data database and web-based mapping tool that collects evidence of different rare diseases found in skeletons and mummies globally and throughout all historic and prehistoric time periods. This easily searchable database allows queries by diagnosis, the preservation level of human remains, research methodology, place of curation and publications. In this manuscript, the design and functionality of the DAARD are illustrated using examples of achondroplasia and other types of stunted growth. CONCLUSIONS As an open, collaborative repository for collecting, mapping and querying well-structured medical data on individuals from ancient times, the DAARD opens new avenues of research. Over time, the number of rare diseases will increase through the addition of new cases from varied backgrounds such as museum collections and archaeological excavations. Depending on the research question, phenotypic or genetic information can be retrieved, as well as information on the general occurrence of a rare disease in selected space-time intervals. Furthermore, for individuals diagnosed with a rare disease, this approach can help them to build identity and reveal an aspect of their condition they might not have been aware of. Thus, the DAARD contributes to the understanding of rare diseases from a long-term perspective and adds to the latest medical research.
Collapse
Affiliation(s)
- Julia Gresky
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany.
| | - Melina Frotscher
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany
| | - Juliane Dorn
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany
| | | | - Yannick Ahlbrecht
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany
| | - Tina Jakob
- Department of Archaeology, Durham University, Durham, UK
| | | | | | - Benjamin Ducke
- Central Research Services/IT, German Archaeological Institute, Berlin, Germany
| | - Emmanuele Petiti
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany
| |
Collapse
|
10
|
Hussain SI, Muhammad N, Shah SA, Rehman AU, Khan SA, Saleha S, Khan YM, Muhammad N, Khan S, Wasif N. Variants in HCFC1 and MN1 genes causing intellectual disability in two Pakistani families. BMC Med Genomics 2024; 17:176. [PMID: 38956580 PMCID: PMC11221130 DOI: 10.1186/s12920-024-01943-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/21/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND Intellectual disability (ID) is a neurodevelopmental condition affecting around 2% of children and young adults worldwide, characterized by deficits in intellectual functioning and adaptive behavior. Genetic factors contribute to the development of ID phenotypes, including mutations and structural changes in chromosomes. Pathogenic variants in the HCFC1 gene cause X-linked mental retardation syndrome, also known as Siderius type X-linked mental retardation. The MN1 gene is necessary for palate development, and mutations in this gene result in a genetic condition called CEBALID syndrome. METHODS Exome sequencing was used to identify the disease-causing variants in two affected families, A and B, from various regions of Pakistan. Affected individuals in these two families presented ID, developmental delay, and behavioral abnormalities. The validation and co-segregation analysis of the filtered variant was carried out using Sanger sequencing. RESULTS In an X-linked family A, a novel hemizygous missense variant (c.5705G > A; p.Ser1902Asn) in the HCFC1 gene (NM_005334.3) was identified, while in family B exome sequencing revealed a heterozygous nonsense variant (c.3680 G > A; p. Trp1227Ter) in exon-1 of the MN1 gene (NM_032581.4). Sanger sequencing confirmed the segregation of these variants with ID in each family. CONCLUSIONS The investigation of two Pakistani families revealed pathogenic genetic variants in the HCFC1 and MN1 genes, which cause ID and expand the mutational spectrum of these genes.
Collapse
Affiliation(s)
- Syeda Iqra Hussain
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Nazif Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Shahbaz Ali Shah
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Adil U Rehman
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
- Department of Computer Science and Bioinformatics, Khushal Khan Khatak University, Karak, Pakistan
| | - Shamim Saleha
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Yar Muhammad Khan
- Department of Biotechnology, University of Science and Technology, Bannu, Pakistan
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan.
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany.
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.
| |
Collapse
|
11
|
Brankovic M, Ivanovic V, Basta I, Khang R, Lee E, Stevic Z, Ralic B, Tubic R, Seo G, Markovic V, Bozovic I, Svetel M, Marjanovic A, Veselinovic N, Mesaros S, Jankovic M, Savic-Pavicevic D, Jovin Z, Novakovic I, Lee H, Peric S. Whole exome sequencing in Serbian patients with hereditary spastic paraplegia. Neurogenetics 2024; 25:165-177. [PMID: 38499745 DOI: 10.1007/s10048-024-00755-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/08/2024] [Indexed: 03/20/2024]
Abstract
Hereditary spastic paraplegia (HSP) is a group of neurodegenerative diseases with a high genetic and clinical heterogeneity. Numerous HSP patients remain genetically undiagnosed despite screening for known genetic causes of HSP. Therefore, identification of novel variants and genes is needed. Our previous study analyzed 74 adult Serbian HSP patients from 65 families using panel of the 13 most common HSP genes in combination with a copy number variation analysis. Conclusive genetic findings were established in 23 patients from 19 families (29%). In the present study, nine patients from nine families previously negative on the HSP gene panel were selected for the whole exome sequencing (WES). Further, 44 newly diagnosed adult HSP patients from 44 families were sent to WES directly, since many studies showed WES may be used as the first step in HSP diagnosis. WES analysis of cohort 1 revealed a likely genetic cause in five (56%) of nine HSP families, including variants in the ETHE1, ZFYVE26, RNF170, CAPN1, and WASHC5 genes. In cohort 2, possible causative variants were found in seven (16%) of 44 patients (later updated to 27% when other diagnosis were excluded), comprising six different genes: SPAST, SPG11, WASCH5, KIF1A, KIF5A, and ABCD1. These results expand the genetic spectrum of HSP patients in Serbia and the region with implications for molecular genetic diagnosis and future causative therapies. Wide HSP panel can be the first step in diagnosis, alongside with the copy number variation (CNV) analysis, while WES should be performed after.
Collapse
Affiliation(s)
- Marija Brankovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia.
| | - Vukan Ivanovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Ivana Basta
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | | | | | - Zorica Stevic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | | | - Radoje Tubic
- Institute for Oncology and Radiology of Serbia, Belgrade, Serbia
| | | | - Vladana Markovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ivo Bozovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Marina Svetel
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ana Marjanovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Nikola Veselinovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Sarlota Mesaros
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Milena Jankovic
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Dusanka Savic-Pavicevic
- Center for Human Molecular Genetics, Faculty of Biology, University of Belgrade, Belgrade, Serbia
| | - Zita Jovin
- Neurology Clinic, University Clinical Center of Vojvodina, Novi Sad, Serbia
| | - Ivana Novakovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Hane Lee
- 3Billion, Inc., Seoul, South Korea
| | - Stojan Peric
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| |
Collapse
|
12
|
Wang A, Liu C, Yang J, Weng C. Fine-tuning Large Language Models for Rare Disease Concept Normalization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.28.573586. [PMID: 38234802 PMCID: PMC10793431 DOI: 10.1101/2023.12.28.573586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Objective We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). Methods We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. Results When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ~20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. Conclusion Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLM to identify named medical entities from the clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Collapse
Affiliation(s)
- Andy Wang
- Peddie School, Hightstown, NJ, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Jingye Yang
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
13
|
Hanukoglu A, Banne E, Lev D, Wainstein J. Autosomal Dominant, Long-Standing Dysglycemia in 2 Families with Unique Phenotypic Features. Clin Med Insights Endocrinol Diabetes 2024; 17:11795514241259740. [PMID: 38854748 PMCID: PMC11159530 DOI: 10.1177/11795514241259740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/10/2024] [Indexed: 06/11/2024] Open
Abstract
We describe 2 families with 5 members from 2 generations whose clinical and laboratory characteristics over up to 15 years were consistent with dysglycemia/impaired glucose tolerance. In both families (2 probands and 3 family members), long-term follow-up excluded diabetes type 1 and type 2. Diabetes type 1 antibodies were persistently negative and C-peptide levels were normal. In Family 1, the proband, during a follow-up of 7 years (10.3-17.5 years of age), exhibited persistently high HbA1c (>5.7%) with fasting blood glucose levels mostly higher than 100 mg/dl and postprandial glucose levels up to 180 mg/dl. She eventually required oral anti-diabetics with an improvement in glycemic balance. The father and sister also had persistent mild hyperglycemia with borderline high HbA1c (mostly > 5.7%) levels over 15 and 6.2 years respectively. In Family 2, the proband exhibited borderline high fasting hyperglycemia (>100 mg/dl) at age 16.2 years with increasing HbA1c levels (from 5.6%-5.9%) and impaired glucose tolerance at age 18.3 years (2 h blood glucose 156 mg/dl after 75 g glucose). His sister also exhibited borderline hyperglycemia with borderline high HbA1c over 2 years (13.6-15.4 years). These subjects shared a unique phenotype. They are tall and slim with decreased BMI. Three subjects from Generation II failed to thrive during infancy. In view of the data from 2 generations suggesting maturity-onset diabetes of the young (MODY) with autosomal dominant inheritance, we sought to analyze the MODY genes. In Family 1, the molecular analysis by the MODY panel including 11 genes and whole exome sequencing did not detect any mutation in the proband. In Family 2, the MODY panel was also negative in the proband's sister. These families may represent a hitherto unidentified syndrome. Unique features described in this report may help to reveal additional families with similar characteristics and to decipher the molecular basis of this syndrome. In selected cases, oral antidiabetics in adolescents may improve the glycemic balance.
Collapse
Affiliation(s)
- Aaron Hanukoglu
- Division of Pediatric Endocrinology, Holon, Israel
- E. Wolfson Medical Center, Holon, Israel
- Maccabi Healthcare Services, Holon, Israel
- Tel-Aviv University, Sackler School of Medicine, Tel Aviv, Israel
| | - Ehud Banne
- E. Wolfson Medical Center, Holon, Israel
- Rina Mor Institute of Medical Genetics, Holon, Israel
| | - Dorit Lev
- E. Wolfson Medical Center, Holon, Israel
- Maccabi Healthcare Services, Holon, Israel
- Tel-Aviv University, Sackler School of Medicine, Tel Aviv, Israel
- Rina Mor Institute of Medical Genetics, Holon, Israel
| | - Julio Wainstein
- E. Wolfson Medical Center, Holon, Israel
- Tel-Aviv University, Sackler School of Medicine, Tel Aviv, Israel
- Diabetes Unit, Holon, Israel
| |
Collapse
|
14
|
Prawitt D, Eggermann T. Molecular mechanisms of human overgrowth and use of omics in its diagnostics: chances and challenges. Front Genet 2024; 15:1382371. [PMID: 38894719 PMCID: PMC11183334 DOI: 10.3389/fgene.2024.1382371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/14/2024] [Indexed: 06/21/2024] Open
Abstract
Overgrowth disorders comprise a group of entities with a variable phenotypic spectrum ranging from tall stature to isolated or lateralized overgrowth of body parts and or organs. Depending on the underlying physiological pathway affected by pathogenic genetic alterations, overgrowth syndromes are associated with a broad spectrum of neoplasia predisposition, (cardio) vascular and neurodevelopmental anomalies, and dysmorphisms. Pathologic overgrowth may be of prenatal or postnatal onset. It either results from an increased number of cells (intrinsic cellular hyperplasia), hypertrophy of the normal number of cells, an increase in interstitial spaces, or from a combination of all of these. The underlying molecular causes comprise a growing number of genetic alterations affecting skeletal growth and Growth-relevant signaling cascades as major effectors, and they can affect the whole body or parts of it (mosaicism). Furthermore, epigenetic modifications play a critical role in the manifestation of some overgrowth diseases. The diagnosis of overgrowth syndromes as the prerequisite of a personalized clinical management can be challenging, due to their clinical and molecular heterogeneity. Physicians should consider molecular genetic testing as a first diagnostic step in overgrowth syndromes. In particular, the urgent need for a precise diagnosis in tumor predisposition syndromes has to be taken into account as the basis for an early monitoring and therapy. With the (future) implementation of next-generation sequencing approaches and further omic technologies, clinical diagnoses can not only be verified, but they also confirm the clinical and molecular spectrum of overgrowth disorders, including unexpected findings and identification of atypical cases. However, the limitations of the applied assays have to be considered, for each of the disorders of interest, the spectrum of possible types of genomic variants has to be considered as they might require different methodological strategies. Additionally, the integration of artificial intelligence (AI) in diagnostic workflows significantly contribute to the phenotype-driven selection and interpretation of molecular and physiological data.
Collapse
Affiliation(s)
- Dirk Prawitt
- Center for Pediatrics and Adolescent Medicine, University Medical Center, Mainz, Germany
| | - Thomas Eggermann
- Institute for Human Genetics and Genome Medicine, Medical Faculty, RWTH Aachen, Aachen, Germany
| |
Collapse
|
15
|
Faviez C, Chen X, Garcelon N, Zaidan M, Billot K, Petzold F, Faour H, Douillet M, Rozet JM, Cormier-Daire V, Attié-Bitach T, Lyonnet S, Saunier S, Burgun A. Objectivizing issues in the diagnosis of complex rare diseases: lessons learned from testing existing diagnosis support systems on ciliopathies. BMC Med Inform Decis Mak 2024; 24:134. [PMID: 38789985 PMCID: PMC11127295 DOI: 10.1186/s12911-024-02538-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 05/17/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND There are approximately 8,000 different rare diseases that affect roughly 400 million people worldwide. Many of them suffer from delayed diagnosis. Ciliopathies are rare monogenic disorders characterized by a significant phenotypic and genetic heterogeneity that raises an important challenge for clinical diagnosis. Diagnosis support systems (DSS) applied to electronic health record (EHR) data may help identify undiagnosed patients, which is of paramount importance to improve patients' care. Our objective was to evaluate three online-accessible rare disease DSSs using phenotypes derived from EHRs for the diagnosis of ciliopathies. METHODS Two datasets of ciliopathy cases, either proven or suspected, and two datasets of controls were used to evaluate the DSSs. Patient phenotypes were automatically extracted from their EHRs and converted to Human Phenotype Ontology terms. We tested the ability of the DSSs to diagnose cases in contrast to controls based on Orphanet ontology. RESULTS A total of 79 cases and 38 controls were selected. Performances of the DSSs on ciliopathy real world data (best DSS with area under the ROC curve = 0.72) were not as good as published performances on the test set used in the DSS development phase. None of these systems obtained results which could be described as "expert-level". Patients with multisystemic symptoms were generally easier to diagnose than patients with isolated symptoms. Diseases easily confused with ciliopathy generally affected multiple organs and had overlapping phenotypes. Four challenges need to be considered to improve the performances: to make the DSSs interoperable with EHR systems, to validate the performances in real-life settings, to deal with data quality, and to leverage methods and resources for rare and complex diseases. CONCLUSION Our study provides insights into the complexities of diagnosing highly heterogenous rare diseases and offers lessons derived from evaluation existing DSSs in real-world settings. These insights are not only beneficial for ciliopathy diagnosis but also hold relevance for the enhancement of DSS for various complex rare disorders, by guiding the development of more clinically relevant rare disease DSSs, that could support early diagnosis and finally make more patients eligible for treatment.
Collapse
Affiliation(s)
- Carole Faviez
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France.
- HeKA, Inria Paris, Paris, F-75012, France.
- Universite Paris Cite, Paris, France.
| | - Xiaoyi Chen
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Nicolas Garcelon
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Mohamad Zaidan
- Service de Néphrologie, Dialyse et Transplantation, Hôpital Universitaire Bicêtre, Assistance Publique-Hôpitaux de Paris (AP-HP), Kremlin Bicêtre, F-94270, France
| | - Katy Billot
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Friederike Petzold
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
- Division of Nephrology, University of Leipzig Medical Center, Leipzig, Germany
| | - Hassan Faour
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Maxime Douillet
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Jean-Michel Rozet
- Laboratory of Genetics in Ophthalmology, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Valérie Cormier-Daire
- Reference Centre for Constitutional Bone Diseases, laboratory of Osteochondrodysplasia, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
- Service de médecine génomique des maladies rares, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| | - Tania Attié-Bitach
- Service d'Histologie-Embryologie-Cytogénétique, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| | - Stanislas Lyonnet
- Service de médecine génomique des maladies rares, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
- Laboratory of Embryology and Genetics of Congenital Malformations, INSERM UMR 1163, Imagine Institute, Paris Cité, Paris, F-75015, France
| | - Sophie Saunier
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Anita Burgun
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Department of Medical Informatics, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| |
Collapse
|
16
|
Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024; 40:btae301. [PMID: 38696757 PMCID: PMC11132820 DOI: 10.1093/bioinformatics/btae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/05/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability. RESULTS We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information. AVAILABILITY AND IMPLEMENTATION EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.
Collapse
Affiliation(s)
- Azza Althagafi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Fernando Zhapa-Camacho
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| |
Collapse
|
17
|
Sippelli F, Briuglia S, Ferraloro C, Capra AP, Agolini E, Abbate T, Pepe G, Aversa T, Wasniewska M, Corica D. Identification of a novel GNAS mutation in a family with pseudohypoparathyroidism type 1A. BMC Pediatr 2024; 24:271. [PMID: 38664677 PMCID: PMC11044326 DOI: 10.1186/s12887-024-04761-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 04/12/2024] [Indexed: 04/29/2024] Open
Abstract
BACKGROUND Pseudohypoparathyroidism (PHP) is caused by loss-of-function mutations at the GNAS gene (as in the PHP type 1A; PHP1A), de novo or inherited at heterozygous state, or by epigenetic alterations at the GNAS locus (as in the PHP1B). The condition of PHP refers to a heterogeneous group of disorders that share common clinical and biological features of PTH resistance. Manifestations related to resistance to other hormones are also reported in many patients with PHP, in association with the phenotypic picture of Albright hereditary osteodystrophy characterized by short stature, round facies, subcutaneous ossifications, brachydactyly, mental retardation and, in some subtypes, obesity. The purpose of our study is to report a new mutation in the GNAS gene and to describe the significant phenotypic variability of three sisters with PHP1A bearing the same mutation. CASE PRESENTATION We describe the cases of three sisters with PHP1A bearing the same mutation but characterized by a significantly different phenotypic picture at onset and during follow-up in terms of clinical features, auxological pattern and biochemical changes. Clinical exome sequencing revealed a never before described heterozygote mutation in the GNAS gene (NM_000516.5 c.118_139 + 51del) of autosomal dominant maternal transmission in the three siblings, confirming the diagnosis of PHP1A. CONCLUSIONS This study reported on a novel mutation of GNAS gene and highlighted the clinical heterogeneity of PHP1A characterized by wide genotype-phenotype variability. The appropriate diagnosis has crucial implications for patient care and long-term multidisciplinary follow-up.
Collapse
Affiliation(s)
- Fabio Sippelli
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Silvana Briuglia
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, University of Messina, Messina, Italy
| | - Chiara Ferraloro
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Anna Paola Capra
- Department of Chemical, Biological, Pharmaceutical, and Environmental Sciences, University of Messina, Messina, Italy
| | - Emanuele Agolini
- Translational Cytogenomics Research Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Tiziana Abbate
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Giorgia Pepe
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Tommaso Aversa
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Malgorzata Wasniewska
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Domenico Corica
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy.
| |
Collapse
|
18
|
Margiotti K, Fabiani M, Cima A, Libotte F, Mesoraca A, Giorlandino C. Prenatal Diagnosis by Trio Clinical Exome Sequencing: Single Center Experience. Curr Issues Mol Biol 2024; 46:3209-3217. [PMID: 38666931 PMCID: PMC11048976 DOI: 10.3390/cimb46040201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 03/28/2024] [Accepted: 04/03/2024] [Indexed: 04/28/2024] Open
Abstract
Fetal anomalies, characterized by structural or functional abnormalities occurring during intrauterine life, pose a significant medical challenge, with a notable prevalence, affecting approximately 2-3% of live births and 20% of spontaneous miscarriages. This study aims to identify the genetic cause of ultrasound anomalies through clinical exome sequencing (CES) analysis. The focus is on utilizing CES analysis in a trio setting, involving the fetuses and both parents. To achieve this objective, prenatal trio clinical exome sequencing was conducted in 51 fetuseses exhibiting ultrasound anomalies with previously negative results from chromosomal microarray (CMA) analysis. The study revealed pathogenic variants in 24% of the analyzed cases (12 out of 51). It is worth noting that the findings include de novo variants in 50% of cases and the transmission of causative variants from asymptomatic parents in 50% of cases. Trio clinical exome sequencing stands out as a crucial tool in advancing prenatal diagnostics, surpassing the effectiveness of relying solely on chromosomal microarray analysis. This underscores its potential to become a routine diagnostic standard in prenatal care, particularly for cases involving ultrasound anomalies.
Collapse
Affiliation(s)
- Katia Margiotti
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Marco Fabiani
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Antonella Cima
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Francesco Libotte
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Alvaro Mesoraca
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Claudio Giorlandino
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
- Fetal-Maternal Medical Centre, Altamedica Viale Liegi 45, 00198 Rome, Italy
| |
Collapse
|
19
|
Kim HH, Kim DW, Woo J, Lee K. Explicable prioritization of genetic variants by integration of rule-based and machine learning algorithms for diagnosis of rare Mendelian disorders. Hum Genomics 2024; 18:28. [PMID: 38509596 PMCID: PMC10956189 DOI: 10.1186/s40246-024-00595-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 03/03/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND In the process of finding the causative variant of rare diseases, accurate assessment and prioritization of genetic variants is essential. Previous variant prioritization tools mainly depend on the in-silico prediction of the pathogenicity of variants, which results in low sensitivity and difficulty in interpreting the prioritization result. In this study, we propose an explainable algorithm for variant prioritization, named 3ASC, with higher sensitivity and ability to annotate evidence used for prioritization. 3ASC annotates each variant with the 28 criteria defined by the ACMG/AMP genome interpretation guidelines and features related to the clinical interpretation of the variants. The system can explain the result based on annotated evidence and feature contributions. RESULTS We trained various machine learning algorithms using in-house patient data. The performance of variant ranking was assessed using the recall rate of identifying causative variants in the top-ranked variants. The best practice model was a random forest classifier that showed top 1 recall of 85.6% and top 3 recall of 94.4%. The 3ASC annotates the ACMG/AMP criteria for each genetic variant of a patient so that clinical geneticists can interpret the result as in the CAGI6 SickKids challenge. In the challenge, 3ASC identified causal genes for 10 out of 14 patient cases, with evidence of decreased gene expression for 6 cases. Among them, two genes (HDAC8 and CASK) had decreased gene expression profiles confirmed by transcriptome data. CONCLUSIONS 3ASC can prioritize genetic variants with higher sensitivity compared to previous methods by integrating various features related to clinical interpretation, including features related to false positive risk such as quality control and disease inheritance pattern. The system allows interpretation of each variant based on the ACMG/AMP criteria and feature contribution assessed using explainable AI techniques.
Collapse
Affiliation(s)
- Ho Heon Kim
- Research and Development Center, 3billion, 14th floor, 416 Teheran-ro, Gangnam-gu, Seoul, 06193, Republic of Korea
| | - Dong-Wook Kim
- Research and Development Center, 3billion, 14th floor, 416 Teheran-ro, Gangnam-gu, Seoul, 06193, Republic of Korea
| | - Junwoo Woo
- Research and Development Center, 3billion, 14th floor, 416 Teheran-ro, Gangnam-gu, Seoul, 06193, Republic of Korea
| | - Kyoungyeul Lee
- Research and Development Center, 3billion, 14th floor, 416 Teheran-ro, Gangnam-gu, Seoul, 06193, Republic of Korea.
| |
Collapse
|
20
|
Bhasin MA, Knaus A, Incardona P, Schmid A, Holtgrewe M, Elbracht M, Krawitz PM, Hsieh TC. Enhancing Variant Prioritization in VarFish through On-Premise Computational Facial Analysis. Genes (Basel) 2024; 15:370. [PMID: 38540429 PMCID: PMC10969976 DOI: 10.3390/genes15030370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/03/2024] [Accepted: 03/13/2024] [Indexed: 06/14/2024] Open
Abstract
Genomic variant prioritization is crucial for identifying disease-associated genetic variations. Integrating facial and clinical feature analyses into this process enhances performance. This study demonstrates the integration of facial analysis (GestaltMatcher) and Human Phenotype Ontology analysis (CADA) within VarFish, an open-source variant analysis framework. Challenges related to non-open-source components were addressed by providing an open-source version of GestaltMatcher, facilitating on-premise facial analysis to address data privacy concerns. Performance evaluation on 163 patients recruited from a German multi-center study of rare diseases showed PEDIA's superior accuracy in variant prioritization compared to individual scores. This study highlights the importance of further benchmarking and future integration of advanced facial analysis approaches aligned with ACMG guidelines to enhance variant classification.
Collapse
Affiliation(s)
- Meghna Ahuja Bhasin
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Alexej Knaus
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Pietro Incardona
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
- Core Unit for Bioinformatics Data Analysis, Medical Faculty, University of Bonn, 53127 Bonn, Germany
| | - Alexander Schmid
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Manuel Holtgrewe
- CUBI—Core Unit Bioinformatics, Berlin Institute of Health, 10117 Berlin, Germany;
| | - Miriam Elbracht
- Institute for Human Genetics and Genomic Medicine, Medical Faculty, RWTH Aachen University, 52062 Aachen, Germany;
| | - Peter M. Krawitz
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Tzung-Chien Hsieh
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| |
Collapse
|
21
|
Carrer A, Romaniello MG, Calderara ML, Mariani M, Biondi A, Selicorni A. Application of the Face2Gene tool in an Italian dysmorphological pediatric clinic: Retrospective validation and future perspectives. Am J Med Genet A 2024; 194:e63459. [PMID: 37927205 DOI: 10.1002/ajmg.a.63459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/15/2023] [Accepted: 10/16/2023] [Indexed: 11/07/2023]
Abstract
Neurodevelopmental disorders exhibit recurrent facial features that can suggest the genetic diagnosis at a glance, but recognizing subtle dysmorphisms is a specialized skill that requires very long training. Face2Gene (FDNA Inc) is an innovative computer-aided phenotyping tool that analyses patient's portraits and suggests 30 candidate syndromes with similar morphology in a prioritized list. We hypothesized that the software could support even expert physicians in the diagnostic workup of genetic conditions. In this study, we assessed the performance of Face2Gene in an Italian dysmorphological pediatrics clinic. We uploaded two-dimensional face pictures of 145 children affected by genetic conditions with typical phenotypic traits. All diagnoses were previously confirmed by cytogenetic or molecular tests. Overall, the software's differential included the correct syndrome in most cases (98%). We evaluated the efficiency of the algorithm even considering the rareness of the genetic conditions. All "common" diagnoses were correctly identified, most of them with high diagnostic accuracy (93% in top-3 matches). Finally, the performance for the most common pediatric syndromes was calculated. Face2Gene performed well even for ultra-rare genetic conditions (75% within top-3 matches and 83% within top-10 matches). Expert geneticists maybe do not need computer support to recognize common syndromes, but our results prove that the tool can be useful not only for general pediatricians but also in dysmorphological clinics for ultra-rare genetic conditions.
Collapse
Affiliation(s)
- Alessia Carrer
- Department of Health Sciences, University of Milan, Milan, Italy
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
| | - Maria Giovanna Romaniello
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
- School of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Maria Letizia Calderara
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
- Department of Medicine and Surgery, University of Insubria, Varese, Italy
| | - Milena Mariani
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
| | - Andrea Biondi
- Department of Medicine and Surgery, University of Insubria, Varese, Italy
- Paediatrics, Fondazione IRCCS San Gerardo dei Tintori, Monza, Italy
| | - Angelo Selicorni
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
| |
Collapse
|
22
|
Lee JY, Oh SH, Keum C, Lee BL, Chung WY. Clinical application of prospective whole-exome sequencing in the diagnosis of genetic disease: Experience of a regional disease center in South Korea. Ann Hum Genet 2024; 88:101-112. [PMID: 37795942 DOI: 10.1111/ahg.12530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 08/29/2023] [Accepted: 09/11/2023] [Indexed: 10/06/2023]
Abstract
INTRODUCTION Next-generation sequencing helps clinicians diagnose patients with suspected genetic disorders. The current study aimed to investigate the diagnostic yield and clinical utility of prospective whole-exome sequencing (WES) in rare diseases. METHODS WES was performed in 92 patients who presented with clinical symptoms suggestive of genetic disorders. The WES data were analyzed using an in-house developed software. The patients' phenotypic characteristics were classified according to the human phenotype ontology. RESULTS WES detected 64 variants, 13 were classified as pathogenic, 26 as likely pathogenic, and 25 as variants of uncertain significance. In 57 patients with these variants, 30 were identified as causal variants. The diagnostic yield was higher in patients with abnormalities in joint mobility and skin morphology than in those with cerebellar hypoplasia/atrophy, epilepsy, global developmental delay, dysmorphic features/facial dysmorphisms, and chronic kidney disease/abnormal renal morphology. CONCLUSION In this study, a WES-based variant interpretation system was employed to provide a definitive diagnosis for 28.3% of the patients suspected of having genetic disorders. WES is particularly useful for diagnosing rare diseases with symptoms that affect more than one system, when targeted genetic panels are difficult to employ.
Collapse
Affiliation(s)
- Ja Young Lee
- Department of Laboratory Medicine, Inje University College of Medicine, Busan, South Korea
| | - Seung-Hwan Oh
- Department of Laboratory Medicine, Pusan National University School of Medicine, Yangsan, South Korea
| | | | - Bo Lyun Lee
- Department of Pediatrics, Inje University College of Medicine, Busan, South Korea
| | - Woo Yeong Chung
- Department of Pediatrics, Inje University College of Medicine, Busan, South Korea
| |
Collapse
|
23
|
Latif M, Hashmi JA, Alayoubi AM, Ayub A, Basit S. Identification of Novel and Recurrent Variants in BTD, GBE1, AGL and ASL Genes in Families with Metabolic Disorders in Saudi Arabia. J Clin Med 2024; 13:1193. [PMID: 38592052 PMCID: PMC10932034 DOI: 10.3390/jcm13051193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/07/2023] [Accepted: 11/14/2023] [Indexed: 04/10/2024] Open
Abstract
Background and Objectives: Inherited metabolic disorders (IMDs) are a group of genetic disorders characterized by defects in enzymes or transport proteins involved in metabolic processes. These defects result in an abnormal accumulation of metabolites and thus interfere with the body's metabolism. A variety of IMDs exist and differential diagnosis is often challenging. Our objective was to gain insight into the genetic basis of IMDs and the correlations between specific genetic mutations and clinical presentations in patients admitted at various hospitals in the Madinah region of the Kingdom of Saudi Arabia. Material and Methods: Whole exome sequencing (WES) has emerged as a powerful tool for diagnosing IMDs and allows for the identification of disease-causing genetic mutations in individuals suspected of IMDs. This ensures accurate diagnosis and appropriate management. WES was performed in four families with multiple individuals showing clinical presentation of IMDs. Validation of the variants identified through WES was conducted using Sanger sequencing. Furthermore, various computational analyses were employed to uncover the disease gene co-expression and metabolic pathways. Results: Exome variant data analysis revealed missense variants in the BTD (c.1270G > C), ASL (c.1300G > T), GBE1 (c.985T > G) and AGL (c.113C > G) genes. Mutations in these genes are known to cause IMDs. Conclusions: Thus, our data showed that exome sequencing, in conjunction with clinical and biochemical characteristics and pathological hallmarks, could deliver an accurate and high-throughput outcome for the diagnosis and sub-typing of IMDs. Overall, our findings emphasize that the integration of WES with clinical and pathological information has the potential to improve the diagnosis and understanding of IMDs and related disorders, ultimately benefiting patients and the medical community.
Collapse
Affiliation(s)
- Muhammad Latif
- Department of Basic Medical Sciences, College of Medicine, Taibah University, Madinah 42353, Saudi Arabia; (J.A.H.); (A.M.A.)
- Center for Genetics and Inherited Diseases, Taibah University, Madinah 42353, Saudi Arabia
| | - Jamil Amjad Hashmi
- Department of Basic Medical Sciences, College of Medicine, Taibah University, Madinah 42353, Saudi Arabia; (J.A.H.); (A.M.A.)
- Center for Genetics and Inherited Diseases, Taibah University, Madinah 42353, Saudi Arabia
| | - Abdulfatah M. Alayoubi
- Department of Basic Medical Sciences, College of Medicine, Taibah University, Madinah 42353, Saudi Arabia; (J.A.H.); (A.M.A.)
| | - Arusha Ayub
- Department of Medicine, School of Health Sciences, University of Georgia, Tbilisi, P. O. Box-0171, Georgia;
| | - Sulman Basit
- Department of Basic Medical Sciences, College of Medicine, Taibah University, Madinah 42353, Saudi Arabia; (J.A.H.); (A.M.A.)
- Center for Genetics and Inherited Diseases, Taibah University, Madinah 42353, Saudi Arabia
| |
Collapse
|
24
|
Yang J, Shu L, Han M, Pan J, Chen L, Yuan T, Tan L, Shu Q, Duan H, Li H. RDmaster: A novel phenotype-oriented dialogue system supporting differential diagnosis of rare disease. Comput Biol Med 2024; 169:107924. [PMID: 38181610 DOI: 10.1016/j.compbiomed.2024.107924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/18/2023] [Accepted: 01/01/2024] [Indexed: 01/07/2024]
Abstract
BACKGROUND Clinicians often lack the necessary expertise to differentially diagnose multiple underlying rare diseases (RDs) due to their complex and overlapping clinical features, leading to misdiagnoses and delayed treatments. The aim of this study is to develop a novel electronic differential diagnostic support system for RDs. METHOD Through integrating two Bayesian diagnostic methods, a candidate list was generated with enhance clinical interpretability for the further Q&A based differential diagnosis (DDX). To achieve an efficient Q&A dialogue strategy, we introduce a novel metric named the adaptive information gain and Gini index (AIGGI) to evaluate the expected gain of interrogated phenotypes within real-time diagnostic states. RESULTS This DDX tool called RDmaster has been implemented as a web-based platform (http://rdmaster.nbscn.org/). A diagnostic trial involving 238 published RD patients revealed that RDmaster outperformed existing RD diagnostic tools, as well as ChatGPT, and was shown to enhance the diagnostic accuracy through its Q&A system. CONCLUSIONS The RDmaster offers an effective multi-omics differential diagnostic technique and outperforms existing tools and popular large language models, particularly enhancing differential diagnosis in collecting diagnostically beneficial phenotypes.
Collapse
Affiliation(s)
- Jian Yang
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China; The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Liqi Shu
- Rhode Island Hospital, Warren Alpert Medical School of Brown University, Rhode Island, USA
| | - Mingyu Han
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Jiarong Pan
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Lihua Chen
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Tianming Yuan
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Linhua Tan
- Surgical Intensive Care Unit, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Qiang Shu
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Huilong Duan
- The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Haomin Li
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China.
| |
Collapse
|
25
|
Lagorce D, Lebreton E, Matalonga L, Hongnat O, Chahdil M, Piscia D, Paramonov I, Ellwanger K, Köhler S, Robinson P, Graessner H, Beltran S, Lucano C, Hanauer M, Rath A. Phenotypic similarity-based approach for variant prioritization for unsolved rare disease: a preliminary methodological report. Eur J Hum Genet 2024; 32:182-189. [PMID: 37926714 PMCID: PMC10853199 DOI: 10.1038/s41431-023-01486-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 09/13/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023] Open
Abstract
Rare diseases (RD) have a prevalence of not more than 1/2000 persons in the European population, and are characterised by the difficulty experienced in obtaining a correct and timely diagnosis. According to Orphanet, 72.5% of RD have a genetic origin although 35% of them do not yet have an identified causative gene. A significant proportion of patients suspected to have a genetic RD receive an inconclusive exome/genome sequencing. Working towards the International Rare Diseases Research Consortium (IRDiRC)'s goal for 2027 to ensure that all people living with a RD receive a diagnosis within one year of coming to medical attention, the Solve-RD project aims to identify the molecular causes underlying undiagnosed RD. As part of this strategy, we developed a phenotypic similarity-based variant prioritization methodology comparing submitted cases with other submitted cases and with known RD in Orphanet. Three complementary approaches based on phenotypic similarity calculations using the Human Phenotype Ontology (HPO), the Orphanet Rare Diseases Ontology (ORDO) and the HPO-ORDO Ontological Module (HOOM) were developed; genomic data reanalysis was performed by the RD-Connect Genome-Phenome Analysis Platform (GPAP). The methodology was tested in 4 exemplary cases discussed with experts from European Reference Networks. Variants of interest (pathogenic or likely pathogenic) were detected in 8.8% of the 725 cases clustered by similarity calculations. Diagnostic hypotheses were validated in 42.1% of them and needed further exploration in another 10.9%. Based on the promising results, we are devising an automated standardized phenotypic-based re-analysis pipeline to be applied to the entire unsolved cases cohort.
Collapse
Affiliation(s)
- David Lagorce
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France.
| | - Emeline Lebreton
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Leslie Matalonga
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, 08028, Spain
| | - Oscar Hongnat
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Maroua Chahdil
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Davide Piscia
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, 08028, Spain
| | - Ida Paramonov
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, 08028, Spain
| | - Kornelia Ellwanger
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Centre for Rare Diseases, University of Tübingen, Tübingen, Germany
| | | | - Peter Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Holm Graessner
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Centre for Rare Diseases, University of Tübingen, Tübingen, Germany
| | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, 08028, Spain
| | - Caterina Lucano
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Marc Hanauer
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Ana Rath
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| |
Collapse
|
26
|
Nawaz H, Parveen A, Khan SA, Zalan AK, Khan MA, Muhammad N, Hassib NF, Mostafa MI, Elhossini RM, Roshdy NN, Ullah A, Arif A, Khan S, Ammerpohl O, Wasif N. Brachyolmia, dental anomalies and short stature (DASS): Phenotype and genotype analyses of Egyptian and Pakistani patients. Heliyon 2024; 10:e23688. [PMID: 38192829 PMCID: PMC10772639 DOI: 10.1016/j.heliyon.2023.e23688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 11/29/2023] [Accepted: 12/09/2023] [Indexed: 01/10/2024] Open
Abstract
Brachyolmia is a heterogeneous group of developmental disorders characterized by a short trunk, short stature, scoliosis, and generalized platyspondyly without significant deformities in the long bones. DASS (Dental Abnormalities and Short Stature), caused by alterations in the LTBP3 gene, was previously considered as a subtype of brachyolmia. The present study investigated three unrelated consanguineous families (A, B, C) with Brachyolmia and DASS from Egypt and Pakistan. In our Egyptian patients, we also observed hearing impairment. Exome sequencing was performed to determine the genetic causes of the diverse clinical conditions in the patients. Exome sequencing identified a novel homozygous splice acceptor site variant (LTBP3:c.3629-1G > T; p. ?) responsible for DASS phenotypes and a known homozygous missense variant (CABP2: c.590T > C; p.Ile197Thr) causing hearing impairment in the Egyptian patients. In addition, two previously reported homozygous frameshift variants (LTBP3:c.132delG; p.Pro45Argfs*25) and (LTBP3:c.2216delG; p.Gly739Alafs*7) were identified in Pakistani patients. This study emphasizes the vital role of LTBP3 in the axial skeleton and tooth morphogenesis and expands the mutational spectrum of LTBP3. We are reporting LTBP3 variants in seven patients of three families, majorly causing brachyolmia with dental and cardiac anomalies. Skeletal assessment documented short webbed neck, broad chest, evidences of mild long bones involvement, short distal phalanges, pes planus and osteopenic bone texture as additional associated findings expanding the clinical phenotype of DASS. The current study reveals that the hearing impairment phenotype in Egyptian patients of family A has a separate transmission mechanism independent of LTBP3.
Collapse
Affiliation(s)
- Hamed Nawaz
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Pakistan
| | - Asia Parveen
- Department of Biochemistry, Faculty of Life Sciences, Gulab Devi Educational Complex, Gulab Devi Hospital, 54000, Lahore, Pakistan
- Faculty of Science and Technology, University of Central Punjab (UCP), Lahore, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Pakistan
- Department of Computer Science and Bioinformatics, Khushal Khan Khatak University, Karak, Pakistan
| | - Abul Khair Zalan
- BDS, MDS Registrar Pediatric Dentistry, Department of Pediatric Dentistry, School of Dentistry, PIMS, Islamabad, Pakistan
| | - Muhammad Adnan Khan
- Dental Material, Institute of Basic Medical Sciences, Khyber Medical University Peshawar, Peshawar, Pakistan
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Pakistan
| | - Nehal F. Hassib
- Orodental Genetics Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo, 12622, Egypt
- School of Dentistry, New Giza University, Giza, Egypt
| | - Mostafa I. Mostafa
- Orodental Genetics Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo, 12622, Egypt
| | - Rasha M. Elhossini
- Clinical Genetics Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo, 12622, Egypt
| | - Nehal Nabil Roshdy
- Endodontics, Faculty of Dentistry, Cairo University, Cairo, 11553, Egypt
| | - Asmat Ullah
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Amina Arif
- Faculty of Science and Technology, University of Central Punjab (UCP), Lahore, Pakistan
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Pakistan
| | - Ole Ammerpohl
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Campus Kiel, D-24105, Kiel, Germany
| |
Collapse
|
27
|
Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. PATTERNS (NEW YORK, N.Y.) 2024; 5:100887. [PMID: 38264716 PMCID: PMC10801236 DOI: 10.1016/j.patter.2023.100887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 01/25/2024]
Abstract
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
Collapse
Affiliation(s)
- Jingye Yang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Wendy Deng
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Da Wu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
28
|
Hussain SI, Muhammad N, Khan N, Khan M, Fardous F, Tahir R, Yasin M, Khan SA, Saleha S, Muhammad N, Wasif N, Khan S. Molecular insight into CREBBP and TANGO2 variants causing intellectual disability. J Gene Med 2024; 26:e3591. [PMID: 37721116 DOI: 10.1002/jgm.3591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/07/2023] [Accepted: 08/24/2023] [Indexed: 09/19/2023] Open
Abstract
BACKGROUND Intellectual disability (ID) can be associated with different syndromes such as Rubinstein-Taybi syndrome (RSTS) and can also be related to conditions such as metabolic encephalomyopathic crises, recurrent,with rhabdomyolysis, cardiac arrhythmias and neurodegeneration. Rare congenital RSTS1 (OMIM 180849) is characterized by mental and growth retardation, significant and duplicated distal phalanges of thumbs and halluces, facial dysmorphisms, and an elevated risk of malignancies. Microdeletions and point mutations in the CREB-binding protein (CREBBP) gene, located at 16p13.3, have been reported to cause RSTS. By contrast, TANGO2-related metabolic encephalopathy and arrhythmia (TRMEA) is a rare metabolic condition that causes repeated metabolic crises, hypoglycemia, lactic acidosis, rhabdomyolysis, arrhythmias and encephalopathy with cognitive decline. Clinicians need more clinical and genetic evidence to detect and comprehend the phenotypic spectrum of this disorder. METHODS Exome sequencing was used to identify the disease-causing variants in two affected families A and B from District Kohat and District Karak, Khyber Pakhtunkhwa. Affected individuals from both families presented symptoms of ID, developmental delay and behavioral abnormalities. The validation and co-segregation analysis of the filtered variant was carried out using Sanger sequencing. RESULTS In the present study, two families (A and B) exhibiting various forms of IDs were enrolled. In Family A, exome sequencing revealed a novel missense variant (NM 004380.3: c.4571A>G; NP_004371.2: p.Lys1524Arg) in the CREBBP gene, whereas, in Family B, a splice site variant (NM 152906.7: c.605 + 1G>A) in the TANGO2 gene was identified. Sanger sequencing of both variants confirmed their segregation with ID in both families. The in silico tools verified the aberrant changes in the CREBBP protein structure. Wild-type and mutant CREBBP protein structures were superimposed and conformational changes were observed likely altering the protein function. CONCLUSIONS RSTS and TRMEA are exceedingly rare disorders for which specific clinical characteristics have been clearly established, but more investigations are underway and required. Multicenter studies are needed to increase our understanding of the clinical phenotypes, mainly showing the genotype-phenotype associations.
Collapse
Affiliation(s)
- Syeda Iqra Hussain
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Nazif Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Niamatullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Mobeen Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Fardous Fardous
- Department of Medical Lab Technology, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Raheel Tahir
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Muhammad Yasin
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Shamim Saleha
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, Ulm, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Kiel, Germany
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| |
Collapse
|
29
|
Meyer C, Romero NB, Evangelista T, Cadot B, Laporte J, Jeannin-Girardon A, Collet P, Ayadi A, Chennen K, Poch O. IMPatienT: An Integrated Web Application to Digitize, Process and Explore Multimodal PATIENt daTa. J Neuromuscul Dis 2024; 11:855-870. [PMID: 38701156 PMCID: PMC11307071 DOI: 10.3233/jnd-230085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2024] [Indexed: 05/05/2024]
Abstract
Medical acts, such as imaging, lead to the production of various medical text reports that describe the relevant findings. This induces multimodality in patient data by combining image data with free-text and consequently, multimodal data have become central to drive research and improve diagnoses. However, the exploitation of patient data is problematic as the ecosystem of analysis tools is fragmented according to the type of data (images, text, genetics), the task (processing, exploration) and domain of interest (clinical phenotype, histology). To address the challenges, we developed IMPatienT (Integrated digital Multimodal PATIENt daTa), a simple, flexible and open-source web application to digitize, process and explore multimodal patient data. IMPatienT has a modular architecture allowing to: (i) create a standard vocabulary for a domain, (ii) digitize and process free-text data, (iii) annotate images and perform image segmentation, (iv) generate a visualization dashboard and provide diagnosis decision support. To demonstrate the advantages of IMPatienT, we present a use case on a corpus of 40 simulated muscle biopsy reports of congenital myopathy patients. As IMPatienT provides users with the ability to design their own vocabulary, it can be adapted to any research domain and can be used as a patient registry for exploratory data analysis. A demo instance of the application is available at https://impatient.lbgi.fr/.
Collapse
Affiliation(s)
- Corentin Meyer
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Norma Beatriz Romero
- Neuromuscular Morphology Unit, Myology Institute, Reference Center of Neuromuscular Diseases Nord-Est-IDF, GHU Pitié-Salpêtrière, Paris, France
| | - Teresinha Evangelista
- Neuromuscular Morphology Unit, Myology Institute, Reference Center of Neuromuscular Diseases Nord-Est-IDF, GHU Pitié-Salpêtrière, Paris, France
| | - Brunot Cadot
- Sorbonne Université, INSERM, Center for Research in Myology, Myology Institute, GHU Pitié-Salpêtrière, Paris, France
| | - Jocelyn Laporte
- Department Translational Medicine, IGBMC, CNRS UMR 7104, Illkirch, France
| | - Anne Jeannin-Girardon
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Ali Ayadi
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Kirsley Chennen
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| |
Collapse
|
30
|
Kouri C, Sommer G, Martinez de Lapiscina I, Elzenaty RN, Tack LJW, Cools M, Ahmed SF, Flück CE. Clinical and genetic characteristics of a large international cohort of individuals with rare NR5A1/SF-1 variants of sex development. EBioMedicine 2024; 99:104941. [PMID: 38168586 PMCID: PMC10797150 DOI: 10.1016/j.ebiom.2023.104941] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Steroidogenic factor 1 (SF-1/NR5A1) is essential for human sex development. Heterozygous NR5A1/SF-1 variants manifest with a broad range of phenotypes of differences of sex development (DSD), which remain unexplained. METHODS We conducted a retrospective analysis on the so far largest international cohort of individuals with NR5A1/SF-1 variants, identified through the I-DSD registry and a research network. FINDINGS Among 197 individuals with NR5A1/SF-1 variants, we confirmed diverse phenotypes. Over 70% of 46, XY individuals had a severe DSD phenotype, while 90% of 46, XX individuals had female-typical sex development. Close to 100 different novel and known NR5A1/SF-1 variants were identified, without specific hot spots. Additionally, likely disease-associated variants in other genes were reported in 32 individuals out of 128 tested (25%), particularly in those with severe or opposite sex DSD phenotypes. Interestingly, 48% of these variants were found in known DSD or SF-1 interacting genes, but no frequent gene-clusters were identified. Sex registration at birth varied, with <10% undergoing reassignment. Gonadectomy was performed in 30% and genital surgery in 58%. Associated organ anomalies were observed in 27% of individuals with a DSD, mainly concerning the spleen. Intrafamilial phenotypes also varied considerably. INTERPRETATION The observed phenotypic variability in individuals and families with NR5A1/SF-1 variants is large and remains unpredictable. It may often not be solely explained by the monogenic pathogenicity of the NR5A1/SF-1 variants but is likely influenced by additional genetic variants and as-yet-unknown factors. FUNDING Swiss National Science Foundation (320030-197725) and Boveri Foundation Zürich, Switzerland.
Collapse
Affiliation(s)
- Chrysanthi Kouri
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland; Graduate School for Cellular and Biomedical Sciences, University of Bern, Bern 3012, Switzerland
| | - Grit Sommer
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland; Institute of Social and Preventive Medicine, University of Bern, Switzerland, University of Bern, Bern 3012, Switzerland
| | - Idoia Martinez de Lapiscina
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland; Research into the Genetics and Control of Diabetes and Other Endocrine Disorders, Biobizkaia Health Research Institute, Cruces University Hospital, Barakaldo 48903, Spain; CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Instituto de Salud Carlos III, Madrid 28029, Spain; CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid 28029, Spain; Endo-ERN, Amsterdam 1081 HV, the Netherlands
| | - Rawda Naamneh Elzenaty
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland; Graduate School for Cellular and Biomedical Sciences, University of Bern, Bern 3012, Switzerland
| | - Lloyd J W Tack
- Department of Paediatric Endocrinology, Department of Paediatrics and Internal Medicine, Ghent University Hospital, Ghent University, Ghent 9000, Belgium
| | - Martine Cools
- Department of Paediatric Endocrinology, Department of Paediatrics and Internal Medicine, Ghent University Hospital, Ghent University, Ghent 9000, Belgium
| | - S Faisal Ahmed
- Developmental Endocrinology Research Group, University of Glasgow, Royal Hospital for Sick Children, Glasgow G51 4TF, UK
| | - Christa E Flück
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland.
| |
Collapse
|
31
|
Karafyllis I, Nuoffer JM, Michelis JP, Chilver-Stainer L. Untreated Classic Galactosemia: A Rare Cause of Adult-Onset Progressive Cerebellar Ataxia - A Case Report. Case Rep Neurol 2024; 16:55-62. [PMID: 38444718 PMCID: PMC10914380 DOI: 10.1159/000536679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 01/24/2024] [Indexed: 03/07/2024] Open
Abstract
Introduction Identifying the underlying etiology of nonfamilial adult-onset progressive cerebellar ataxia is often challenging because neurologists must consider almost all nongenetic and genetic causes of ataxia. Case Presentation A 39-year-old woman was hospitalized for progressive ataxia with pyramidal and cognitive dysfunction after a right arm shaking and coordination problem deteriorated progressively over 1.5 years. The patient's medical history included amenorrhea, cataracts, developmental delays, consanguinity of the parents, motor coordination issues, and diarrhea and vomiting in infancy. An important finding that enabled us to solve the diagnostic conundrum was the elevated carbohydrate-deficient transferrin levels in the lack of alcohol-related symptoms, which also occur in untreated carbohydrate metabolism disorders, sometimes with ataxia as a leading symptom. The decreased erythrocyte galactose-1-phosphate uridyltransferase (GALT) enzyme activity and the elevated erythrocyte galactose-1-phosphate (Gal-1P) concentration led to the final diagnosis of galactosemia, a rare metabolic disorder. The patient's condition stayed stable with strict adherence to lactose-free and galactose-restricted diets, regular physiotherapy, and speech therapy, despite attempts to control the crippling tremor. Conclusion This case highlights the importance of considering rare diseases based on unexplained clinical and laboratory findings. Newborn screening does not change the long-term complications of early-treated classical galactosemia. A small percentage of these patients develop ataxia tremor syndrome.
Collapse
Affiliation(s)
- Ioannis Karafyllis
- Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department of Neurology, Cantonal Hospital Olten, Olten, Switzerland
| | - Jean-Marc Nuoffer
- Department of Pediatric Endocrinology, Diabetology and Metabolism, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- University Institute of Clinical Chemistry, University of Bern, Bern, Switzerland
| | - Joan-Philipp Michelis
- Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Lara Chilver-Stainer
- Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| |
Collapse
|
32
|
Carmody LC, Gargano MA, Toro S, Vasilevsky NA, Adam MP, Blau H, Chan LE, Gomez-Andres D, Horvath R, Kraus ML, Ladewig MS, Lewis-Smith D, Lochmüller H, Matentzoglu NA, Munoz-Torres MC, Schuetz C, Seitz B, Similuk MN, Sparks TN, Strauss T, Swietlik EM, Thompson R, Zhang XA, Mungall CJ, Haendel MA, Robinson PN. The Medical Action Ontology: A tool for annotating and analyzing treatments and clinical management of human disease. MED 2023; 4:913-927.e3. [PMID: 37963467 PMCID: PMC10842845 DOI: 10.1016/j.medj.2023.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/31/2023] [Accepted: 10/14/2023] [Indexed: 11/16/2023]
Abstract
BACKGROUND Navigating the clinical literature to determine the optimal clinical management for rare diseases presents significant challenges. We introduce the Medical Action Ontology (MAxO), an ontology specifically designed to organize medical procedures, therapies, and interventions. METHODS MAxO incorporates logical structures that link MAxO terms to numerous other ontologies within the OBO Foundry. Term development involves a blend of manual and semi-automated processes. Additionally, we have generated annotations detailing diagnostic modalities for specific phenotypic abnormalities defined by the Human Phenotype Ontology (HPO). We introduce a web application, POET, that facilitates MAxO annotations for specific medical actions for diseases using the Mondo Disease Ontology. FINDINGS MAxO encompasses 1,757 terms spanning a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. These terms annotate phenotypic features associated with specific disease (using HPO and Mondo). Presently, there are over 16,000 MAxO diagnostic annotations that target HPO terms. Through POET, we have created 413 MAxO annotations specifying treatments for 189 rare diseases. CONCLUSIONS MAxO offers a computational representation of treatments and other actions taken for the clinical management of patients. Its development is closely coupled to Mondo and HPO, broadening the scope of our computational modeling of diseases and phenotypic features. We invite the community to contribute disease annotations using POET (https://poet.jax.org/). MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO). FUNDING NHGRI 1U24HG011449-01A1 and NHGRI 5RM1HG010860-04.
Collapse
Affiliation(s)
- Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Margaret P Adam
- University of Washington School of Medicine, Seattle, WA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - David Gomez-Andres
- Pediatric Neurology, Vall d'Hebron Institut de Recerca (VHIR), Hospital Universitari Vall d'Hebron, Vall d'Hebron Barcelona Hospital Campus, Passeig Vall d'Hebron 119-129, 08035 Barcelona, Spain
| | - Rita Horvath
- Department of Clinical Neurosciences, University of Cambridge, Robinson Way, Cambridge CB2 0PY, UK
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Markus S Ladewig
- Department of Ophthalmology, Klinikum Saarbrücken, Saarbrücken, Germany
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Hanns Lochmüller
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada; Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada; Brain and Mind Research Institute, University of Ottawa, Ottawa, Canada; Department of Neuropediatrics and Muscle Disorders, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany; Centro Nacional de Análisis Genómico, Barcelona, Spain
| | | | | | - Catharina Schuetz
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center UKS, Homburg, Saar, Germany
| | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Teresa N Sparks
- Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Timmy Strauss
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Emilia M Swietlik
- Department of Medicine, University of Cambridge, Heart and Lung Research Institute, Cambridge CB2 0BB, UK
| | - Rachel Thompson
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada
| | | | | | | | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
33
|
Groza T, Wu H, Dinger ME, Danis D, Hilton C, Bagley A, Davids JR, Luo L, Lu Z, Robinson PN. Term-BLAST-like alignment tool for concept recognition in noisy clinical texts. Bioinformatics 2023; 39:btad716. [PMID: 38001031 PMCID: PMC10710372 DOI: 10.1093/bioinformatics/btad716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/20/2023] [Accepted: 11/23/2023] [Indexed: 11/26/2023] Open
Abstract
MOTIVATION Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Genetics and Rare Diseases Program, Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Honghan Wu
- Institute of Health Informatics, University College London, London WC1E 6BT, United Kingdom
| | - Marcel E Dinger
- Pryzm Health, Sydney, NSW 2089, Australia
- School of Life and Environmental Sciences, Faculty of Science, University of Sydney, NSW 2006, Australia
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Coleman Hilton
- Shriners Children’s Corporate Headquarters, Tampa, FL 33607, United States
| | - Anita Bagley
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Jon R Davids
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Ling Luo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States
| |
Collapse
|
34
|
Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT. ARXIV 2023:arXiv:2308.06294v2. [PMID: 37986722 PMCID: PMC10659449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models - PhenoBCBERT and PhenoGPT - for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes, due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models (LLMs) to automate the detection of phenotype terms, including those not in the current HPO. We compared these models to PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also showed strong performance in case studies on biomedical literature. We evaluated the strengths and weaknesses of BERT-based and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
Collapse
Affiliation(s)
- Jingye Yang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Wendy Deng
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Da Wu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Biostatistics and Bioinformatics facility, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
35
|
Bi X, Liang W, Zhao Q, Wang J. SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data. Bioinformatics 2023; 39:btad662. [PMID: 37941450 PMCID: PMC10666204 DOI: 10.1093/bioinformatics/btad662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/17/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Medical genomics faces significant challenges in interpreting disease phenotype and genetic heterogeneity. Despite the establishment of standardized disease phenotype databases, computational methods for predicting gene-phenotype associations still suffer from imbalanced category distribution and a lack of labeled data in small categories. RESULTS To address the problem of labeled-data scarcity, we propose a self-supervised learning strategy for gene-phenotype association prediction, called SSLpheno. Our approach utilizes an attributed network that integrates protein-protein interactions and gene ontology data. We apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation. Specifically, we calculate the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. We employ a deep neural network for multi-label classification of phenotypes in the downstream task. Our experimental results demonstrate that SSLpheno outperforms state-of-the-art methods, especially in categories with fewer annotations. Moreover, our case studies illustrate the potential of SSLpheno as an effective prescreening tool for gene-phenotype association identification. AVAILABILITY AND IMPLEMENTATION https://github.com/bixuehua/SSLpheno.
Collapse
Affiliation(s)
- Xuehua Bi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Weiyang Liang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
36
|
Alsentzer E, Finlayson SG, Li MM, Kobren SN, Kohane IS. Simulation of undiagnosed patients with novel genetic conditions. Nat Commun 2023; 14:6403. [PMID: 37828001 PMCID: PMC10570269 DOI: 10.1038/s41467-023-41980-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/26/2023] [Indexed: 10/14/2023] Open
Abstract
Rare Mendelian disorders pose a major diagnostic challenge and collectively affect 300-400 million patients worldwide. Many automated tools aim to uncover causal genes in patients with suspected genetic disorders, but evaluation of these tools is limited due to the lack of comprehensive benchmark datasets that include previously unpublished conditions. Here, we present a computational pipeline that simulates realistic clinical datasets to address this deficit. Our framework jointly simulates complex phenotypes and challenging candidate genes and produces patients with novel genetic conditions. We demonstrate the similarity of our simulated patients to real patients from the Undiagnosed Diseases Network and evaluate common gene prioritization methods on the simulated cohort. These prioritization methods recover known gene-disease associations but perform poorly on diagnosing patients with novel genetic disorders. Our publicly-available dataset and codebase can be utilized by medical genetics researchers to evaluate, compare, and improve tools that aid in the diagnostic process.
Collapse
Grants
- U01 HG007690 NHGRI NIH HHS
- U54 NS108251 NINDS NIH HHS
- U01 HG010219 NHGRI NIH HHS
- U01 HG007672 NHGRI NIH HHS
- U01 HG010233 NHGRI NIH HHS
- U01 HG010230 NHGRI NIH HHS
- U01 HG007943 NHGRI NIH HHS
- U01 HG010217 NHGRI NIH HHS
- U01 HG007942 NHGRI NIH HHS
- U01 HG010215 NHGRI NIH HHS
- U01 HG007708 NHGRI NIH HHS
- T32 HG002295 NHGRI NIH HHS
- T32 GM007753 NIGMS NIH HHS
- U01 HG007674 NHGRI NIH HHS
- U01 TR001395 NCATS NIH HHS
- U01 HG007709 NHGRI NIH HHS
- U54 NS093793 NINDS NIH HHS
- U01 HG007530 NHGRI NIH HHS
- U01 TR002471 NCATS NIH HHS
- U01 HG007703 NHGRI NIH HHS
- UDN research reported in this manuscript was supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under Award Number(s) U01HG007709, U01HG010219, U01HG010230, U01HG010217, U01HG010233, U01HG010215, U01HG007672, U01HG007690, U01HG007708, U01HG007703, U01HG007674, U01HG007530, U01HG007942, U01HG007943, U01TR001395, U01TR002471, U54NS108251, and U54NS093793.
- E.A. is supported by a Microsoft Research PhD Fellowship.
- S.F. is supported by award Number T32GM007753 from the National Institute of General Medical Sciences.
- M.L. is supported by T32HG002295 from the National Human Genome Research Institute and a National Science Foundation Graduate Research Fellowship.
Collapse
Affiliation(s)
- Emily Alsentzer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Program in Health Sciences and Technology, MIT, Cambridge, MA, 02139, USA
| | - Samuel G Finlayson
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Program in Health Sciences and Technology, MIT, Cambridge, MA, 02139, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA, 98105, USA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98105, USA
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, 02115, USA
| | - Shilpa N Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
37
|
Hussain SI, Muhammad N, Shah SUD, Fardous F, Khan SA, Khan N, Rehman AU, Siddique M, Wasan SA, Niaz R, Ullah H, Khan N, Muhammad N, Mirza MU, Wasif N, Khan S. Structural and functional implications of SLC13A3 and SLC9A6 mutations: an in silico approach to understanding intellectual disability. BMC Neurol 2023; 23:353. [PMID: 37794328 PMCID: PMC10548666 DOI: 10.1186/s12883-023-03397-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/20/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Intellectual disability (ID) is a condition that varies widely in both its clinical presentation and its genetic underpinnings. It significantly impacts patients' learning capacities and lowers their IQ below 70. The solute carrier (SLC) family is the most abundant class of transmembrane transporters and is responsible for the translocation of various substances across cell membranes, including nutrients, ions, metabolites, and medicines. The SLC13A3 gene encodes a plasma membrane-localized Na+/dicarboxylate cotransporter 3 (NaDC3) primarily expressed in the kidney, astrocytes, and the choroid plexus. In addition to three Na + ions, it brings four to six carbon dicarboxylates into the cytosol. Recently, it was discovered that patients with acute reversible leukoencephalopathy and a-ketoglutarate accumulation (ARLIAK) carry pathogenic mutations in the SLC13A3 gene, and the X-linked neurodevelopmental condition Christianson Syndrome is caused by mutations in the SLC9A6 gene, which encodes the recycling endosomal alkali cation/proton exchanger NHE6, also called sodium-hydrogen exchanger-6. As a result, there are severe impairments in the patient's mental capacity, physical skills, and adaptive behavior. METHODS AND RESULTS Two Pakistani families (A and B) with autosomal recessive and X-linked intellectual disorders were clinically evaluated, and two novel disease-causing variants in the SLC13A3 gene (NM 022829.5) and the SLC9A6 gene (NM 001042537.2) were identified using whole exome sequencing. Family-A segregated a novel homozygous missense variant (c.1478 C > T; p. Pro493Leu) in the exon-11 of the SLC13A3 gene. At the same time, family-B segregated a novel missense variant (c.1342G > A; p.Gly448Arg) in the exon-10 of the SLC9A6 gene. By integrating computational approaches, our findings provided insights into the molecular mechanisms underlying the development of ID in individuals with SLC13A3 and SLC9A6 mutations. CONCLUSION We have utilized in-silico tools in the current study to examine the deleterious effects of the identified variants, which carry the potential to understand the genotype-phenotype relationships in neurodevelopmental disorders.
Collapse
Affiliation(s)
- Syeda Iqra Hussain
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Nazif Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Salah Ud Din Shah
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Fardous Fardous
- Department of Medical Lab Technology, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Niamatullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Adil U Rehman
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Mehwish Siddique
- Department of Zoology, Government Post Graduate College for Women, Satellite Town, Gujranwala, Pakistan
| | - Shoukat Ali Wasan
- Department of Botany, Faculty of Natural Sciences, Shah Abdul Latif University, Khairpur, Sindh, Pakistan
| | - Rooh Niaz
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Hafiz Ullah
- Gomal Center of Biochemistry and Biotechnology (GCBB), Gomal University D. I. Khan, D. I. Khan, Pakistan
| | - Niamat Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Muhammad Usman Mirza
- Department of Chemistry and Biochemistry, University of Windsor, Windsor, ON, N9B 1C4, Canada
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany.
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan.
| |
Collapse
|
38
|
Zaman Q, Iftikhar A, Rehman G, Khan Q, Najumuddin, Jan A, Khan J, Anas M, Laiba, Umair M, Muthaffar OY, Abdulkareem AA, Bibi F, Naseer MI, Jelani M. Two novel homozygous variants of ATP6V0A2 and ALDH18A1 lead to autosomal recessive cutis laxa type 2 and 3 in two Pakistani families. J Gene Med 2023; 25:e3522. [PMID: 37119015 DOI: 10.1002/jgm.3522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/10/2023] [Accepted: 04/12/2023] [Indexed: 04/30/2023] Open
Abstract
BACKGROUND Autosomal recessive cutis laxa type 2A (ARCL2A; OMIM: 219200) is characterized by neurovegetative, developmental and progeroid elastic skin anomalies. It is caused by biallelic variation in ATPase, H+ transporting V0 subunit A2 (ATP6V0A2; OMIM: 611716) located on chromosome 12q24.31. Autosomal recessive cutis laxa type 3A (ARCL3A; OMIM: 219150) is another subclinical type characterized by short stature, ophthalmological abnormalities and a progeria-like appearance. The ARCL3A is caused by loss of function alterations in the aldehyde dehydrogenase 18 family member A1 (ALDH18A1; OMIM: 138250) gene located at chromosome 10q24.1. METHODS Whole-exome sequencing (WES), and Sanger sequencing were performed for molecular diagnosis. 3D protein modeling was performed to investigate the deleterious effect of the variant on protein structure. RESULTS In this study, clinical and molecular diagnosis were performed for two families, ED-01 and DWF-41, which displayed hallmark features of ARCL2A and ARCL3A, respectively. Three affected individuals in the ED-01 family (IV-4, IV-5 and V-3) displayed sagging loose skin, down-slanting palpebral fissures, excessive wrinkles on the abdomen, hands and feet, and prominent veins on the trunk. Meanwhile the affected individuals in the DWF-41 family (V-2 and V-3) had progeroid skin, short stature, dysmorphology, low muscle tone, epilepsy, lordosis, scoliosis, delayed puberty and internal genitalia. WES in the index patient (ED-01: IV-4) identified a novel homozygous deletion (NM_012463.3: c.1977_1980del; p.[Val660LeufsTer23]) in exon 16 of the ATP6V0A2 while in DWF-41 a novel homozygous missense variant (NM_001323413.1:c.1867G>A; p.[Asp623Asn]) in exon 15 of the ALDH18A1 was identified. Sanger validation in all available family members confirmed the autosomal recessive modes of inheritances in each family. Three dimensional in-silico protein modeling suggested deleterious impact of the identified variants. Furthermore, these variants were assigned class 1 or "pathogenic" as per guidelines of American College of Medical Genetics 2015. Screening of ethnically matched healthy controls (n = 200 chromosomes), excluded the presence of these variations in general population. CONCLUSIONS To the best of our knowledge, this is the first report of ATP6V0A2 and ALDH18A1 variations in the Pakhtun ethnicity of Pakistani population. The study confirms that WES can be used as a first-line diagnostic test in patients with cutis laxa, and provides basis for population screening and premarital testing to reduce the diseases burden in future generations.
Collapse
Affiliation(s)
- Qaiser Zaman
- Department of Zoology, Government Postgraduate College Dargai, Malakand, Dargai, Pakistan
- Higher Education Department, Peshawar, Khyber Pakhunkhwa, Pakistan
- Department of Zoology, Abdul Wali Khan University Mardan, Khyber Pakhtunkhwa, Pakistan
| | - Aiman Iftikhar
- Department of Zoology, Government Postgraduate College Dargai, Malakand, Dargai, Pakistan
| | - Gauhar Rehman
- Department of Zoology, Abdul Wali Khan University Mardan, Khyber Pakhtunkhwa, Pakistan
| | - Qadeem Khan
- Department of Zoology, Government Postgraduate College Dargai, Malakand, Dargai, Pakistan
| | - Najumuddin
- National Center for Bioinformatics, Quaid-I-Azam University, Islamabad, Pakistan
| | - Amin Jan
- Department of Physiology, North-West School of Medicine Peshawar, Khyber Pakhtunkhwa, Pakistan
| | - Jamshid Khan
- Department of Zoology, Government Postgraduate College Dargai, Malakand, Dargai, Pakistan
| | - Muhammad Anas
- Department of Zoology, Government Postgraduate College Dargai, Malakand, Dargai, Pakistan
| | - Laiba
- Department of Zoology, Government Postgraduate College Dargai, Malakand, Dargai, Pakistan
| | - Muhammad Umair
- Medical Genomics Research Department, King Abdullah International Medical Research Center, King Saud Bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
- Department of Life Sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Osama Yousef Muthaffar
- Department of Pediatrics, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Angham Abdulrhman Abdulkareem
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Faculty of Science, Department of Biochemistry, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Fehmida Bibi
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
- Special Infectious Agents Unit, King Fahd Medical Research Centre, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Imran Naseer
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Musharraf Jelani
- Rare Diseases Genetics and Genomics, Centre for Omic Sciences, Islamia College Peshawar, Khyber Pakhtunkhwa, Pakistan
| |
Collapse
|
39
|
Huang D, Jiang J, Zhao T, Wu S, Li P, Lyu Y, Feng J, Wei M, Zhu Z, Gu J, Ren Y, Yu G, Lu H. diseaseGPS: auxiliary diagnostic system for genetic disorders based on genotype and phenotype. Bioinformatics 2023; 39:btad517. [PMID: 37647638 PMCID: PMC10500091 DOI: 10.1093/bioinformatics/btad517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 07/24/2023] [Accepted: 08/29/2023] [Indexed: 09/01/2023] Open
Abstract
SUMMARY The next-generation sequencing brought opportunities for the diagnosis of genetic disorders due to its high-throughput capabilities. However, the majority of existing methods were limited to only sequencing candidate variants, and the process of linking these variants to a diagnosis of genetic disorders still required medical professionals to consult databases. Therefore, we introduce diseaseGPS, an integrated platform for the diagnosis of genetic disorders that combines both phenotype and genotype data for analysis. It offers not only a user-friendly GUI web application for those without a programming background but also scripts that can be executed in batch mode for bioinformatics professionals. The genetic and phenotypic data are integrated using the ACMG-Bayes method and a novel phenotypic similarity method, to prioritize the results of genetic disorders. diseaseGPS was evaluated on 6085 cases from Deciphering Developmental Disorders project and 187 cases from Shanghai Children's hospital. The results demonstrated that diseaseGPS performed better than other commonly used methods. AVAILABILITY AND IMPLEMENTATION diseaseGPS is available to freely accessed at https://diseasegps.sjtu.edu.cn with source code at https://github.com/BioHuangDY/diseaseGPS.
Collapse
Affiliation(s)
- Daoyi Huang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Jianping Jiang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Tingting Zhao
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai, China
| | - Shengnan Wu
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Pin Li
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yongfen Lyu
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Jincai Feng
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Mingyue Wei
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhixing Zhu
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai, China
| | - Jianlei Gu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yongyong Ren
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Guangjun Yu
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Guangdong, China
| | - Hui Lu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
40
|
Hsieh TC, Krawitz PM. Computational facial analysis for rare Mendelian disorders. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32061. [PMID: 37584245 DOI: 10.1002/ajmg.c.32061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 07/17/2023] [Accepted: 07/28/2023] [Indexed: 08/17/2023]
Abstract
With the advances in computer vision, computational facial analysis has become a powerful and effective tool for diagnosing rare disorders. This technology, also called next-generation phenotyping (NGP), has progressed significantly over the last decade. This review paper will introduce three key NGP approaches. In 2014, Ferry et al. first presented Clinical Face Phenotype Space (CFPS) trained on eight syndromes. After 5 years, Gurovich et al. proposed DeepGestalt, a deep convolutional neural network trained on more than 21,000 patient images with 216 disorders. It was considered a state-of-the-art disorder classification framework. In 2022, Hsieh et al. developed GestaltMatcher to support the ultra-rare and novel disorders not supported in DeepGestalt. It further enabled the analysis of facial similarity presented in a given cohort or multiple disorders. Moreover, this article will present the usage of NGP for variant prioritization and facial gestalt delineation. Although NGP approaches have proven their capability in assisting the diagnosis of many disorders, many limitations remain. This article will introduce two future directions to address two main limitations: enabling the global collaboration for a medical imaging database that fulfills the FAIR principles and synthesizing patient images to protect patient privacy. In the end, with more and more NGP approaches emerging, we envision that the NGP technology can assist clinicians and researchers in diagnosing patients and analyzing disorders in multiple directions in the near future.
Collapse
Affiliation(s)
- Tzung-Chien Hsieh
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Peter M Krawitz
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
41
|
Dingemans AJM, Hinne M, Truijen KMG, Goltstein L, van Reeuwijk J, de Leeuw N, Schuurs-Hoeijmakers J, Pfundt R, Diets IJ, den Hoed J, de Boer E, Coenen-van der Spek J, Jansen S, van Bon BW, Jonis N, Ockeloen CW, Vulto-van Silfhout AT, Kleefstra T, Koolen DA, Campeau PM, Palmer EE, Van Esch H, Lyon GJ, Alkuraya FS, Rauch A, Marom R, Baralle D, van der Sluijs PJ, Santen GWE, Kooy RF, van Gerven MAJ, Vissers LELM, de Vries BBA. PhenoScore quantifies phenotypic variation for rare genetic diseases by combining facial analysis with other clinical features using a machine-learning framework. Nat Genet 2023; 55:1598-1607. [PMID: 37550531 PMCID: PMC11414844 DOI: 10.1038/s41588-023-01469-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 07/05/2023] [Indexed: 08/09/2023]
Abstract
Several molecular and phenotypic algorithms exist that establish genotype-phenotype correlations, including facial recognition tools. However, no unified framework that investigates both facial data and other phenotypic data directly from individuals exists. We developed PhenoScore: an open-source, artificial intelligence-based phenomics framework, combining facial recognition technology with Human Phenotype Ontology data analysis to quantify phenotypic similarity. Here we show PhenoScore's ability to recognize distinct phenotypic entities by establishing recognizable phenotypes for 37 of 40 investigated syndromes against clinical features observed in individuals with other neurodevelopmental disorders and show it is an improvement on existing approaches. PhenoScore provides predictions for individuals with variants of unknown significance and enables sophisticated genotype-phenotype studies by testing hypotheses on possible phenotypic (sub)groups. PhenoScore confirmed previously known phenotypic subgroups caused by variants in the same gene for SATB1, SETBP1 and DEAF1 and provides objective clinical evidence for two distinct ADNP-related phenotypes, already established functionally.
Collapse
Affiliation(s)
- Alexander J M Dingemans
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
- Department of Artificial Intelligence, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Max Hinne
- Department of Artificial Intelligence, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Kim M G Truijen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Lia Goltstein
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Jeroen van Reeuwijk
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Nicole de Leeuw
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Janneke Schuurs-Hoeijmakers
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Rolph Pfundt
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Illja J Diets
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Joery den Hoed
- Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | - Elke de Boer
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Jet Coenen-van der Spek
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Sandra Jansen
- Department of Human Genetics, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
| | - Bregje W van Bon
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Noraly Jonis
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Charlotte W Ockeloen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Anneke T Vulto-van Silfhout
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Tjitske Kleefstra
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - David A Koolen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Philippe M Campeau
- Department of Pediatrics, University of Montreal, Montreal, Quebec, Canada
| | - Elizabeth E Palmer
- Faculty of Medicine and Health, UNSW Sydney, Sydney, New South Wales, Australia
- Sydney Children's Hospitals Network, Sydney, New South Wales, Australia
| | - Hilde Van Esch
- Center for Human Genetics, University Hospitals Leuven, University of Leuven, Leuven, Belgium
| | - Gholson J Lyon
- Department of Human Genetics and George A. Jervis Clinic, Institute for Basic Research in Developmental Disabilities (IBR), Staten Island, NY, USA
- Biology PhD Program, The Graduate Center, The City University of New York, New York City, NY, USA
| | - Fowzan S Alkuraya
- Department of Translational Genomics, Center for Genomic Medicine, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Anita Rauch
- Institute of Medical Genetics, University of Zürich, Zürich, Switzerland
| | - Ronit Marom
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Diana Baralle
- Faculty of Medicine, University of Southampton, Southampton, UK
| | | | - Gijs W E Santen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - R Frank Kooy
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | - Marcel A J van Gerven
- Department of Artificial Intelligence, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands.
| | - Bert B A de Vries
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands.
| |
Collapse
|
42
|
Smirnov D, Konstantinovskiy N, Prokisch H. Integrative omics approaches to advance rare disease diagnostics. J Inherit Metab Dis 2023; 46:824-838. [PMID: 37553850 DOI: 10.1002/jimd.12663] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/10/2023]
Abstract
Over the past decade high-throughput DNA sequencing approaches, namely whole exome and whole genome sequencing became a standard procedure in Mendelian disease diagnostics. Implementation of these technologies greatly facilitated diagnostics and shifted the analysis paradigm from variant identification to prioritisation and evaluation. The diagnostic rates vary widely depending on the cohort size, heterogeneity and disease and range from around 30% to 50% leaving the majority of patients undiagnosed. Advances in omics technologies and computational analysis provide an opportunity to increase these unfavourable rates by providing evidence for disease-causing variant validation and prioritisation. This review aims to provide an overview of the current application of several omics technologies including RNA-sequencing, proteomics, metabolomics and DNA-methylation profiling for diagnostics of rare genetic diseases in general and inborn errors of metabolism in particular.
Collapse
Affiliation(s)
- Dmitrii Smirnov
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| | - Nikita Konstantinovskiy
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
| | - Holger Prokisch
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| |
Collapse
|
43
|
Jolly A, Du H, Borel C, Chen N, Zhao S, Grochowski CM, Duan R, Fatih JM, Dawood M, Salvi S, Jhangiani SN, Muzny DM, Koch A, Rouskas K, Glentis S, Deligeoroglou E, Bacopoulou F, Wise CA, Dietrich JE, Van den Veyver IB, Dimas AS, Brucker S, Sutton VR, Gibbs RA, Antonarakis SE, Wu N, Coban-Akdemir ZH, Zhu L, Posey JE, Lupski JR. Rare variant enrichment analysis supports GREB1L as a contributory driver gene in the etiology of Mayer-Rokitansky-Küster-Hauser syndrome. HGG ADVANCES 2023; 4:100188. [PMID: 37124138 PMCID: PMC10130500 DOI: 10.1016/j.xhgg.2023.100188] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 03/24/2023] [Indexed: 05/02/2023] Open
Abstract
Mayer-Rokitansky-Küster-Hauser (MRKH) syndrome is characterized by aplasia of the female reproductive tract; the syndrome can include renal anomalies, absence or dysgenesis, and skeletal anomalies. While functional models have elucidated several candidate genes, only WNT4 (MIM: 603490) variants have been definitively associated with a subtype of MRKH with hyperandrogenism (MIM: 158330). DNA from 148 clinically diagnosed MRKH probands across 144 unrelated families and available family members from North America, Europe, and South America were exome sequenced (ES) and by family-based genomics analyzed for rare likely deleterious variants. A replication cohort consisting of 442 Han Chinese individuals with MRKH was used to further reproduce GREB1L findings in diverse genetic backgrounds. Proband and OMIM phenotypes annotated using the Human Phenotype Ontology were analyzed to quantitatively delineate the phenotypic spectrum associated with GREB1L variant alleles found in our MRKH cohort and those previously published. This study reports 18 novel GREB1L variant alleles, 16 within a multiethnic MRKH cohort and two within a congenital scoliosis cohort. Cohort-wide analyses for a burden of rare variants within a single gene identified likely damaging variants in GREB1L (MIM: 617782), a known disease gene for renal hypoplasia and uterine abnormalities (MIM: 617805), in 16 of 590 MRKH probands. GREB1L variant alleles, including a CNV null allele, were found in 8 MRKH type 1 probands and 8 MRKH type II probands. This study used quantitative phenotypic analyses in a worldwide multiethnic cohort to identify and strengthen the association of GREB1L to isolated uterine agenesis (MRKH type I) and syndromic MRKH type II.
Collapse
Affiliation(s)
- Angad Jolly
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Haowei Du
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
| | | | - Na Chen
- Department of Obstetrics and Gynaecology, Beijing 100730, China
| | - Sen Zhao
- Department of Orthopedic Surgery, State Key Laboratory of Complex Severe and Rare Diseases and Key Laboratory of Big Data for Spinal Deformities, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100730, China
- Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Chinese Academy of Medical Sciences, Beijing 100730, China
| | | | - Ruizhi Duan
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Jawid M. Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Moez Dawood
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Sejal Salvi
- Human Genome Sequencing Center, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Shalini N. Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine (BCM), Houston, TX, USA
| | - André Koch
- University of Tübingen, Department of Obstetrics and Gynecology, Tübingen, Germany
| | - Konstantinos Rouskas
- Institute for Bioinnovation, Biomedical Sciences Research Center Al. Fleming, Vari, Athens 16672, Greece
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Stavros Glentis
- Institute for Bioinnovation, Biomedical Sciences Research Center Al. Fleming, Vari, Athens 16672, Greece
| | - Efthymios Deligeoroglou
- Center for Adolescent Medicine and UNESCO Chair on Adolescent Health Care, First Department of Pediatrics, School of Medicine, National and Kapodistrian University of Athens, Aghia Sophia Children’s Hospital, Athens 11527, Greece
| | - Flora Bacopoulou
- Center for Adolescent Medicine and UNESCO Chair on Adolescent Health Care, First Department of Pediatrics, School of Medicine, National and Kapodistrian University of Athens, Aghia Sophia Children’s Hospital, Athens 11527, Greece
| | - Carol A. Wise
- Center for Pediatric Bone Biology and Translational Research, Scottish Rite for Children, Dallas, TX, USA
- McDermott Center for Human Growth and Development, Department of Pediatrics and Department of Orthopaedic Surgery, University of Texas Southwestern Medical Center at Dallas, Dallas, TX, USA
| | - Jennifer E. Dietrich
- Department of Obstetrics and Gynecology, Houston, TX, USA
- Department of Pediatrics, BCM, Houston, TX, USA
- Texas Children’s Hospital, Houston, TX, USA
| | - Ignatia B. Van den Veyver
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
- Department of Obstetrics and Gynecology, Houston, TX, USA
- Texas Children’s Hospital, Houston, TX, USA
| | - Antigone S. Dimas
- Institute for Bioinnovation, Biomedical Sciences Research Center Al. Fleming, Vari, Athens 16672, Greece
| | - Sara Brucker
- University of Tübingen, Department of Obstetrics and Gynecology, Tübingen, Germany
| | - V. Reid Sutton
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
- Texas Children’s Hospital, Houston, TX, USA
| | - Richard A. Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Stylianos E. Antonarakis
- University of Geneva Medical School, 1211 Geneva, Switzerland
- Institute of Genetics and Genomics in Geneva, University of Geneva, 1205 Geneva, Switzerland
- Medigenome, the Swiss Institute of Genomic Medicine, 1207 Geneva, Switzerland
| | - Nan Wu
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
- Department of Orthopedic Surgery, State Key Laboratory of Complex Severe and Rare Diseases and Key Laboratory of Big Data for Spinal Deformities, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100730, China
- Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Chinese Academy of Medical Sciences, Beijing 100730, China
| | - Zeynep H. Coban-Akdemir
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Lan Zhu
- Department of Obstetrics and Gynaecology, Beijing 100730, China
| | - Jennifer E. Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
| | - James R. Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine (BCM), Houston, TX, USA
- Department of Pediatrics, BCM, Houston, TX, USA
- Texas Children’s Hospital, Houston, TX, USA
| |
Collapse
|
44
|
Liu X, Gao L, Peng Y, Fang Z, Wang J. PheSom: a term frequency-based method for measuring human phenotype similarity on the basis of MeSH vocabulary. Front Genet 2023; 14:1185790. [PMID: 37496714 PMCID: PMC10366691 DOI: 10.3389/fgene.2023.1185790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open
Abstract
Background: Phenotype similarity calculation should be used to help improve drug repurposing. In this study, based on the MeSH terms describing the phenotypes deposited in OMIM, we proposed a method, namely, PheSom (Phenotype Similarity On MeSH), to measure the similarity between phenotypes. PheSom counted the number of overlapping MeSH terms between two phenotypes and then took the weight of every MeSH term within each phenotype into account according to the term frequency-inverse document frequency (FIDC). Phenotype-related genes were used for the evaluation of our method. Results: A 7,739 × 7,739 similarity score matrix was finally obtained and the number of phenotype pairs was dramatically decreased with the increase of similarity score. Besides, the overlapping rates of phenotype-related genes were remarkably increased with the increase of similarity score between phenotypes, which supports the reliability of our method. Conclusion: We anticipate our method can be applied to identifying novel therapeutic methods for complex diseases.
Collapse
Affiliation(s)
- Xinhua Liu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ling Gao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Yonglin Peng
- Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhonghai Fang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ju Wang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| |
Collapse
|
45
|
Lesmann H, Klinkhammer H, M. Krawitz PDMDPP. The future role of facial image analysis in ACMG classification guidelines. MED GENET-BERLIN 2023; 35:115-121. [PMID: 38840866 PMCID: PMC10842539 DOI: 10.1515/medgen-2023-2014] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
The use of next-generation sequencing (NGS) has dramatically improved the diagnosis of rare diseases. However, the analysis of genomic data has become complex with the increasing detection of variants by exome and genome sequencing. The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) developed a 5-tier classification scheme in 2015 for variant interpretation, that has since been widely adopted. Despite efforts to minimise discrepancies in the application of these criteria, inconsistencies still occur. Further specifications for individual genes were developed by Variant Curation Expert Panels (VCEPs) of the Clinical Genome Resource (ClinGen) consortium, that also take into consideration gene or disease specific features. For instance, in disorders with a highly characerstic facial gestalt a "phenotypic match" (PP4) has higher pathogenic evidence than e.g. in a non-syndromic form of intellectual disability. With computational approaches for quantifying the similarity of dysmorphic features results of such analysis can now be used in a refined Bayesian framework for the ACMG/AMP criteria.
Collapse
Affiliation(s)
- Hellen Lesmann
- University of Bonn, Medical Faculty & University Hospital BonnInstitute of Human GeneticsVenusberg-Campus 153127BonnGermany
| | - Hannah Klinkhammer
- University of BonnInstitute for Genomic Statistics and BioinformaticsBonnGermany
| | | |
Collapse
|
46
|
Muhammad N, Hussain SI, Rehman ZU, Khan SA, Jan S, Khan N, Muzammal M, Abbasi SW, Kakar N, Rehman ZU, Khan MA, Mirza MU, Muhammad N, Khan S, Wasif N. Autosomal recessive variants c.953A>C and c.97-1G>C in NSUN2 causing intellectual disability: a molecular dynamics simulation study of loss-of-function mechanisms. Front Neurol 2023; 14:1168307. [PMID: 37305761 PMCID: PMC10249782 DOI: 10.3389/fneur.2023.1168307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/28/2023] [Indexed: 06/13/2023] Open
Abstract
Introduction Intellectual disability (ID) is a clinically and genetically heterogeneous disorder. It drastically affects the learning capabilities of patients and eventually reduces their IQ level below 70. Methods The current genetic study ascertained two consanguineous Pakistani families suffering from autosomal recessive intellectual developmental disorder-5 (MRT5). We have used exome sequencing followed by Sanger sequencing to identify the disease-causing variants. Results and discussion Genetic analysis using whole exome sequencing in these families identified two novel mutations in the NSUN2 (NM_017755.5). Family-A segregated a novel missense variant c.953A>C; p.Tyr318Ser in exon-9 of the NSUN2. The variant substituted an amino acid Tyr318, highly conserved among different animal species and located in the functional domain of NSUN2 known as "SAM-dependent methyltransferase RsmB/NOP2-type". Whereas in family B, we identified a novel splice site variant c.97-1G>C that affects the splice acceptor site of NSUN2. The identified splice variant (c.97-1G>C) was predicted to result in the skipping of exon-2, which would lead to a frameshift followed by a premature stop codon (p. His86Profs*16). Furthermore, it could result in the termination of translation and synthesis of dysfunctional protein, most likely leading to nonsense-mediated decay. The dynamic consequences of NSUN2 missense variant was further explored together with wildtype through molecular dynamic simulations, which uncovered the disruption of NSUN2 function due to a gain in structural flexibility. The present molecular genetic study further extends the mutational spectrum of NSUN2 to be involved in ID and its genetic heterogeneity in the Pakistani population.
Collapse
Affiliation(s)
- Nazif Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Syeda Iqra Hussain
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Zia Ur Rehman
- Department of General Medicine, Northwest General Hospital & Research Center, Peshawar, Khyber Pakhtunkhwa, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Samin Jan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Niamatullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Muhammad Muzammal
- Gomal Center of Biochemistry and Biotechnology, Gomal University, D.I.Khan, Khyber Pakhtunkhwa, Pakistan
| | - Sumra Wajid Abbasi
- NUMS Department of Biological Sciences, National University of Medical Sciences, The Mall, Rawalpindi, Punjab, Pakistan
| | - Naseebullah Kakar
- Department of Biotechnology, Faculty of Life Sciences and Informatics, BUITEMS, Quetta, Pakistan
- Institute of Human Genetics, Universitätsklinikum Schleswig-Holstein, Lübeck, Germany
| | - Zia Ur Rehman
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Muzammil Ahmad Khan
- Gomal Center of Biochemistry and Biotechnology, Gomal University, D.I.Khan, Khyber Pakhtunkhwa, Pakistan
| | - Muhammad Usman Mirza
- Department of Chemistry and Biochemistry, University of Windsor, Windsor, ON, Canada
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University, and Ulm University Medical Center, Ulm, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Kiel, Germany
| |
Collapse
|
47
|
Yang J, Shu L, Duan H, Li H. A Robust Phenotype-driven Likelihood Ratio Analysis Approach Assisting Interpretable Clinical Diagnosis of Rare Diseases. J Biomed Inform 2023; 142:104372. [PMID: 37105510 DOI: 10.1016/j.jbi.2023.104372] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/20/2023] [Accepted: 04/20/2023] [Indexed: 04/29/2023]
Abstract
Phenotype-based prioritization of candidate genes and diseases has become a well-established approach for multi-omics diagnostics of rare diseases. Most current algorithms exploit semantic analysis and probabilistic statistics based on Human Phenotype Ontology and are commonly superior to naive search methods. However, these algorithms are mostly less interpretable and do not perform well in real clinical scenarios due to noise and imprecision of query terms, and the fact that individuals may not display all phenotypes of the disease they belong to. We present a Phenotype-driven Likelihood Ratio analysis approach (PheLR) assisting interpretable clinical diagnosis of rare diseases. With a likelihood ratio paradigm, PheLR estimates the posterior probability of candidate diseases and how much a phenotypic feature contributes to the prioritization result. Benchmarked using simulated and realistic patients, PheLR shows significant advantages over current approaches and is robust to noise and inaccuracy. To facilitate clinical practice and visualized differential diagnosis, PheLR is implemented as an online web tool (http://phelr.nbscn.org).
Collapse
Affiliation(s)
- Jian Yang
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China; The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Liqi Shu
- Rhode Island Hospital, Warren Alpert Medical School of Brown University, Rhode Island, USA
| | - Huilong Duan
- The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Haomin Li
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China.
| |
Collapse
|
48
|
Lecoquierre F, Quenez O, Fourneaux S, Coutant S, Vezain M, Rolain M, Drouot N, Boland A, Olaso R, Meyer V, Deleuze JF, Dabbagh D, Gilles I, Gayet C, Saugier-Veber P, Goldenberg A, Guerrot AM, Nicolas G. High diagnostic potential of short and long read genome sequencing with transcriptome analysis in exome-negative developmental disorders. Hum Genet 2023; 142:773-783. [PMID: 37076692 DOI: 10.1007/s00439-023-02553-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 04/05/2023] [Indexed: 04/21/2023]
Abstract
Exome sequencing (ES) has become the method of choice for diagnosing rare diseases, while the availability of short-read genome sequencing (SR-GS) in a medical setting is increasing. In addition, new sequencing technologies, such as long-read genome sequencing (LR-GS) and transcriptome sequencing, are being increasingly used. However, the contribution of these techniques compared to widely used ES is not well established, particularly in regards to the analysis of non-coding regions. In a pilot study of five probands affected by an undiagnosed neurodevelopmental disorder, we performed trio-based short-read GS and long-read GS as well as case-only peripheral blood transcriptome sequencing. We identified three new genetic diagnoses, none of which affected the coding regions. More specifically, LR-GS identified a balanced inversion in NSD1, highlighting a rare mechanism of Sotos syndrome. SR-GS identified a homozygous deep intronic variant of KLHL7 resulting in a neoexon inclusion, and a de novo mosaic intronic 22-bp deletion in KMT2D, leading to the diagnosis of Perching and Kabuki syndromes, respectively. All three variants had a significant effect on the transcriptome, which showed decreased gene expression, mono-allelic expression and splicing defects, respectively, further validating the effect of these variants. Overall, in undiagnosed patients, the combination of short and long read GS allowed the detection of cryptic variations not or barely detectable by ES, making it a highly sensitive method at the cost of more complex bioinformatics approaches. Transcriptome sequencing is a valuable complement for the functional validation of variations, particularly in the non-coding genome.
Collapse
Affiliation(s)
- François Lecoquierre
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France.
| | - Olivier Quenez
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Steeve Fourneaux
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Sophie Coutant
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Myriam Vezain
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Marion Rolain
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Nathalie Drouot
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Anne Boland
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Robert Olaso
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Vincent Meyer
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Jean-François Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Dana Dabbagh
- Department of Pediatrics, Elbeuf Hospital, Elbeuf, France
| | | | - Claire Gayet
- Department of Pediatrics, CHU Rouen, F-76000, Rouen, France
| | - Pascale Saugier-Veber
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Alice Goldenberg
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Anne-Marie Guerrot
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Gaël Nicolas
- Univ Rouen Normandie, Inserm U12045 and CHU Rouen, Department of Genetics and Reference Center for Developmental Disorders, FHU-G4 Génomique, F-76000, Rouen, France.
| |
Collapse
|
49
|
James KN, Phadke S, Wong TC, Chowdhury S. Artificial Intelligence in the Genetic Diagnosis of Rare Disease. Clin Lab Med 2023; 43:127-143. [PMID: 36764805 DOI: 10.1016/j.cll.2022.09.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Affiliation(s)
- Kiely N James
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Sujal Phadke
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Terence C Wong
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Shimul Chowdhury
- Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA.
| |
Collapse
|
50
|
Slater K, Williams JA, Schofield PN, Russell S, Pendleton SC, Karwath A, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Klarigi: Characteristic explanations for semantic biomedical data. Comput Biol Med 2023; 153:106425. [PMID: 36638616 DOI: 10.1016/j.compbiomed.2022.106425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/04/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022]
Abstract
Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date, enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes, and is useful for a range of tasks including differential diagnosis and causative variant prioritisation. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they are not designed to solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups. We have developed a new tool, called Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets, in addition to significance testing apposite to traditional enrichment approaches. We describe the design and implementation of Klarigi, including its scoring and explanation determination methods, and evaluate its use in application to two test cases with clinical significance, comparing and contrasting methods and results with literature-based and enrichment analysis methods. We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.
Collapse
Affiliation(s)
- Karin Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
| | - John A Williams
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Paul N Schofield
- Department of Physiology, Development, and Neuroscience, University of Cambridge, UK
| | - Sophie Russell
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Samantha C Pendleton
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Hilary Fanning
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Simon Ball
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| |
Collapse
|