1
|
Benegas G, Albors C, Aw AJ, Ye C, Song YS. A DNA language model based on multispecies alignment predicts the effects of genome-wide variants. Nat Biotechnol 2025:10.1038/s41587-024-02511-w. [PMID: 39747647 DOI: 10.1038/s41587-024-02511-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 11/20/2024] [Indexed: 01/04/2025]
Abstract
Protein language models have demonstrated remarkable performance in predicting the effects of missense variants but DNA language models have not yet shown a competitive edge for complex genomes such as that of humans. This limitation is particularly evident when dealing with the vast complexity of noncoding regions that comprise approximately 98% of the human genome. To tackle this challenge, we introduce GPN-MSA (genomic pretrained network with multiple-sequence alignment), a framework that leverages whole-genome alignments across multiple species while taking only a few hours to train. Across several benchmarks on clinical databases (ClinVar, COSMIC and OMIM), experimental functional assays (deep mutational scanning and DepMap) and population genomic data (gnomAD), our model for the human genome achieves outstanding performance on deleteriousness prediction for both coding and noncoding variants. We provide precomputed scores for all ~9 billion possible single-nucleotide variants in the human genome. We anticipate that our advances in genome-wide variant effect prediction will enable more accurate rare disease diagnosis and improve rare variant burden testing.
Collapse
Affiliation(s)
- Gonzalo Benegas
- Graduate Group in Computational Biology, University of California, Berkeley, CA, US
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, US
| | - Carlos Albors
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, US
| | - Alan J Aw
- Department of Statistics, University of California, Berkeley, CA, US
| | - Chengzhong Ye
- Department of Statistics, University of California, Berkeley, CA, US
| | - Yun S Song
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, US.
- Department of Statistics, University of California, Berkeley, CA, US.
- Center for Computational Biology, University of California, Berkeley, CA, US.
| |
Collapse
|
2
|
Katsonis P, Lichtarge O. Meta-EA: a gene-specific combination of available computational tools for predicting missense variant effects. Nat Commun 2025; 16:159. [PMID: 39746940 PMCID: PMC11696468 DOI: 10.1038/s41467-024-55066-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 11/27/2024] [Indexed: 01/04/2025] Open
Abstract
Computational methods for estimating missense variant impact suffer from inconsistent performance across genes, which poses a major challenge for their reliable use in clinical practice. While ensemble scores leverage multiple prediction methods to enhance consistency, the overrepresentation of certain genes in the training data can bias their outcomes. To address this critical limitation, we propose a gene-specific ensemble framework trained on reference computational annotations rather than on clinical or experimental data. Accordingly, we generate Meta-EA ensemble scores that achieve comparable performance to the top individual predicting method for each gene set. Incorporating the effects of splicing and the allele frequency of human polymorphisms further enhances the performance of Meta-EA, achieving an area under the receiver operating characteristic curve of 0.97 for both gene-balanced and imbalanced clinical assessments. In conclusion, this work leverages the wealth of existing variant impact prediction approaches to generate improved estimations for clinical interpretation.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
3
|
Wang X, Zhang M, Yang X, Yu DJ, Ge F. GPTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in G Protein-Coupled Receptors. J Chem Inf Model 2024; 64:9626-9642. [PMID: 39610143 DOI: 10.1021/acs.jcim.4c01999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Accurately predicting mutations in G protein-coupled receptors (GPCRs) is critical for advancing disease diagnosis and drug discovery. In response to this imperative, GPTrans has emerged as a highly accurate predictor of disease-related mutations in GPCRs. The core innovation of GPTrans resides in the design of a novel feature extraction network, that is capable of integrating features from both wildtype and mutant protein variant sites, utilizing multifeature connections within a transformer framework to ensure comprehensive feature extraction. A key aspect of GPTrans's effectiveness is our introduction of an innovative deep feature integration strategy, which merges embeddings and class tokens from multiple protein language models, including evolutionary scale modeling and ProtTrans, thus shedding light on the biochemical properties of proteins. Leveraging transformer components and a self-attention mechanism, GPTrans captures higher-level representations of protein features. Employing both wildtype and mutation site information for feature fusion not only enriches the predictive feature set but also avoids the common issue of overestimation associated with sequence-based predictions. This approach distinguishes GPTrans, enabling it to significantly outperform existing methods. Our evaluations across diverse GPCR data sets, including ClinVar and MutHTP, demonstrate GPTrans's superior performance, with average AUC values of 0.874 and 0.590 in 10-fold cross-validation. Notably, compared to the AlphaMissense method, GPTrans exhibited a remarkable 38.03% improvement in accuracy when predicting disease-associated mutations in the MutHTP data set. A thorough analysis of the predicted results further validates the model's effectiveness. The source code, data sets, and prediction results for GPTrans are available for academic use at https://github.com/EduardWang/GPTrans.
Collapse
Affiliation(s)
- Xiaohua Wang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Ming Zhang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Xibei Yang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, China
| |
Collapse
|
4
|
Petrazzini BO, Balick DJ, Forrest IS, Cho J, Rocheleau G, Jordan DM, Do R. Ensemble and consensus approaches to prediction of recessive inheritance for missense variants in human disease. CELL REPORTS METHODS 2024; 4:100914. [PMID: 39657681 DOI: 10.1016/j.crmeth.2024.100914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/19/2024] [Accepted: 11/13/2024] [Indexed: 12/12/2024]
Abstract
Mode of inheritance (MOI) is necessary for clinical interpretation of pathogenic variants; however, the majority of variants lack this information. Furthermore, variant effect predictors are fundamentally insensitive to recessive-acting diseases. Here, we present MOI-Pred, a variant pathogenicity prediction tool that accounts for MOI, and ConMOI, a consensus method that integrates variant MOI predictions from three independent tools. MOI-Pred integrates evolutionary and functional annotations to produce variant-level predictions that are sensitive to both dominant-acting and recessive-acting pathogenic variants. Both MOI-Pred and ConMOI show state-of-the-art performance on standard benchmarks. Importantly, dominant and recessive predictions from both tools are enriched in individuals with pathogenic variants for dominant- and recessive-acting diseases, respectively, in a real-world electronic health record (EHR)-based validation approach of 29,981 individuals. ConMOI outperforms its component methods in benchmarking and validation, demonstrating the value of consensus among multiple prediction methods. Predictions for all possible missense variants are provided in the "Data and code availability" section.
Collapse
Affiliation(s)
- Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniel J Balick
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Biomedical Informatics, Harvard, Medical School, Boston, MA, USA
| | - Iain S Forrest
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Judy Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniel M Jordan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
5
|
Larsen-Ledet S, Lindemose S, Panfilova A, Gersing S, Suhr CH, Genzor AV, Lanters H, Nielsen SV, Lindorff-Larsen K, Winther JR, Stein A, Hartmann-Petersen R. Systematic characterization of indel variants using a yeast-based protein folding sensor. Structure 2024:S0969-2126(24)00530-6. [PMID: 39706198 DOI: 10.1016/j.str.2024.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 10/30/2024] [Accepted: 11/26/2024] [Indexed: 12/23/2024]
Abstract
Gene variants resulting in insertions or deletions of amino acid residues (indels) have important consequences for evolution and are often linked to disease, yet, compared to missense variants, the effects of indels are poorly understood and predicted. We developed a sensitive protein folding sensor based on the complementation of uracil auxotrophy in yeast by circular permutated orotate phosphoribosyltransferase (CPOP). The sensor reports on the folding of disease-linked missense variants and de-novo-designed proteins. Applying the folding sensor to a saturated library of single-residue indels in human dihydrofolate reductase (DHFR) revealed that most regions that tolerate indels are confined to internal loops, the termini, and a central α helix. Several indels are temperature sensitive, and folding is rescued upon binding to methotrexate. Rosetta and AlphaFold2 predictions correlate with the observed effects, suggesting that most indels destabilize the native fold and that these computational tools are useful for the classification of indels observed in population sequencing.
Collapse
Affiliation(s)
- Sven Larsen-Ledet
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Søren Lindemose
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Aleksandra Panfilova
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Sarah Gersing
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Caroline H Suhr
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Aitana Victoria Genzor
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Heleen Lanters
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Sofie V Nielsen
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Jakob R Winther
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Amelie Stein
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark.
| | | |
Collapse
|
6
|
Keen J, McDermott JH, Aguilar-Martinez E, Newman WG. Pharmacogenomics: DPYD and Prevention of Toxicity. Clin Oncol (R Coll Radiol) 2024; 38:103706. [PMID: 39721301 DOI: 10.1016/j.clon.2024.103706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 10/10/2024] [Accepted: 12/04/2024] [Indexed: 12/28/2024]
Abstract
In 2020, the introduction of pre-emptive DPYD genotyping prior to the administration of systemic fluoropyrimidine-based chemotherapy represented one of the first widespread pharmacogenetic testing programmes to be applied nationally in the United Kingdom. Pharmacogenetic variants in the DPYD gene found in between 3 and 6% of the population are a recognised cause of primary DPD enzyme deficiency and associated increased risk of severe fluoropyrimidine toxicity [1]. Yet, the availability of testing globally is heterogeneous. Despite growing evidence that in addition to reducing drug-induced toxicity, DPYD-guided dosing does not negatively affect outcomes, further research on the impact of routine DPYD genotyping in the UK population is required. With mandatory testing in the UK focussed on four well-characterised variants, there is a need to address the applicability of this strategy across diverse ethnic or ancestral populations. We highlight approaches to identify and characterise rare variants in DPYD and in other genes involved in the pyrimidine metabolic pathway to reduce healthcare inequalities. Finally, we discuss the future of pharmacogenomics within cancer care, and the potential to harness innovative digital and genotyping technologies to streamline prescribing and optimise both systemic anti-cancer therapies and supportive care.
Collapse
Affiliation(s)
- J Keen
- NHS North West Genomic Medicine Service Alliance, UK.
| | - J H McDermott
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, UK; The Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
| | - E Aguilar-Martinez
- The Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
| | - W G Newman
- NHS North West Genomic Medicine Service Alliance, UK; Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, UK; The Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
| |
Collapse
|
7
|
Zhozhikov L, Vasilev F, Maksimova N. Protein-Variant-Phenotype Study of NBAS Using AlphaFold in the Aspect of SOPH Syndrome. Proteins 2024. [PMID: 39641476 DOI: 10.1002/prot.26764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 10/04/2024] [Accepted: 11/01/2024] [Indexed: 12/07/2024]
Abstract
NBAS gene variants cause phenotypically distinct and nonoverlapping conditions, SOPH syndrome and ILFS2. NBAS is a so-called "moonlighting" protein responsible for retrograde membrane trafficking and nonsense-mediated decay. However, its three-dimensional model and the nature of its possible interactions with other proteins have remained elusive. Here, we used AlphaFold to predict protein-protein interaction (PPI) sites and mapped them to NBAS pathogenic variants. We repeated in silico milestone studies of the NBAS protein to explain the multisystem phenotype of its variants, with particular emphasis on the SOPH variant (p.R1914H). We revealed the putative binding sites for the main interaction partners of NBAS and assessed the implications of these binding sites for the subdomain architecture of the NBAS protein. Using AlphaFold, we disclosed the far-reaching impact of NBAS variants on the development of each phenotypic trait in patients with NBAS-related pathologies.
Collapse
Affiliation(s)
- Leonid Zhozhikov
- Research Laboratory of "Molecular Medicine and Human Genetics", Institute of Medicine, Ammosov North-Eastern Federal University, Yakutsk, Republic of Sakha (Yakutia), Russia
| | - Filipp Vasilev
- Research Laboratory of "Molecular Medicine and Human Genetics", Institute of Medicine, Ammosov North-Eastern Federal University, Yakutsk, Republic of Sakha (Yakutia), Russia
| | - Nadezhda Maksimova
- Research Laboratory of "Molecular Medicine and Human Genetics", Institute of Medicine, Ammosov North-Eastern Federal University, Yakutsk, Republic of Sakha (Yakutia), Russia
| |
Collapse
|
8
|
Estevam GO, Linossi EM, Rao J, Macdonald CB, Ravikumar A, Chrispens KM, Capra JA, Coyote-Maestas W, Pimentel H, Collisson EA, Jura N, Fraser JS. Mapping kinase domain resistance mechanisms for the MET receptor tyrosine kinase via deep mutational scanning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.16.603579. [PMID: 39071407 PMCID: PMC11275805 DOI: 10.1101/2024.07.16.603579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Mutations in the kinase and juxtamembrane domains of the MET Receptor Tyrosine Kinase are responsible for oncogenesis in various cancers and can drive resistance to MET-directed treatments. Determining the most effective inhibitor for each mutational profile is a major challenge for MET-driven cancer treatment in precision medicine. Here, we used a deep mutational scan (DMS) of ~5,764 MET kinase domain variants to profile the growth of each mutation against a panel of 11 inhibitors that are reported to target the MET kinase domain. We validate previously identified resistance mutations, pinpoint common resistance sites across type I, type II, and type I ½ inhibitors, unveil unique resistance and sensitizing mutations for each inhibitor, and verify non-cross-resistant sensitivities for type I and type II inhibitor pairs. We augment a protein language model with biophysical and chemical features to improve the predictive performance for inhibitor-treated datasets. Together, our study demonstrates a pooled experimental pipeline for identifying resistance mutations, provides a reference dictionary for mutations that are sensitized to specific therapies, and offers insights for future drug development.
Collapse
Affiliation(s)
- Gabriella O. Estevam
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
- Tetrad Graduate Program, UCSF, San Francisco, CA, United States
| | - Edmond M. Linossi
- Cardiovascular Research Institute, UCSF, San Francisco, CA, United States
- Department of Cellular and Molecular Pharmacology, UCSF, San Francisco, CA, United States
| | - Jingyou Rao
- Department of Computer Science, UCLA, Los Angeles, CA, United States
| | - Christian B. Macdonald
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
| | - Ashraya Ravikumar
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
| | - Karson M. Chrispens
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
- Biophysics Graduate Program, UCSF, San Francisco, CA, United States
| | - John A. Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, UCSF, San Francisco, CA, United States
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, United States
| | - Harold Pimentel
- Department of Computer Science, UCLA, Los Angeles, CA, United States
- Department of Computational Medicine and Human Genetics, UCLA, Los Angeles, CA, United States
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, United States
| | - Eric A. Collisson
- Human Biology, Fred Hutchinson Cancer Center, Seattle, Washington, United States
- Department of Medicine, University of Washington, Seattle, Washington, United States
| | - Natalia Jura
- Cardiovascular Research Institute, UCSF, San Francisco, CA, United States
- Department of Cellular and Molecular Pharmacology, UCSF, San Francisco, CA, United States
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, United States
| | - James S. Fraser
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States
- Quantitative Biosciences Institute, UCSF, San Francisco, CA, United States
| |
Collapse
|
9
|
Gromiha MM, Pandey M, Kulandaisamy A, Sharma D, Ridha F. Progress on the development of prediction tools for detecting disease causing mutations in proteins. Comput Biol Med 2024; 185:109510. [PMID: 39637461 DOI: 10.1016/j.compbiomed.2024.109510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 11/27/2024] [Accepted: 11/29/2024] [Indexed: 12/07/2024]
Abstract
Proteins are involved in a variety of functions in living organisms. The mutation of amino acid residues in a protein alters its structure, stability, binding, and function, with some mutations leading to diseases. Understanding the influence of mutations on protein structure and function help to gain deep insights on the molecular mechanism of diseases and devising therapeutic strategies. Hence, several generic and disease-specific methods have been proposed to reveal pathogenic effects on mutations. In this review, we focus on the development of prediction methods for identifying disease causing mutations in proteins. We briefly outline the existing databases for disease-causing mutations, followed by a discussion on sequence- and structure-based features used for prediction. Further, we discuss computational tools based on machine learning, deep learning and large language models for detecting disease-causing mutations. Specifically, we emphasize the advances in predicting hotspots and mutations for targets involved in cancer, neurodegenerative and infectious diseases as well as in membrane proteins. The computational resources including databases and algorithms understanding/predicting the effect of mutations will be listed. Moreover, limitations of existing methods and possible improvements will be discussed.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.
| | - Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - A Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Divya Sharma
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Fathima Ridha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| |
Collapse
|
10
|
Murali H, Wang P, Liao EC, Wang K. Genetic variant classification by predicted protein structure: A case study on IRF6. Comput Struct Biotechnol J 2024; 23:892-904. [PMID: 38370976 PMCID: PMC10869248 DOI: 10.1016/j.csbj.2024.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/24/2024] [Accepted: 01/25/2024] [Indexed: 02/20/2024] Open
Abstract
Next-generation genome sequencing has revolutionized genetic testing, identifying numerous rare disease-associated gene variants. However, to impute pathogenicity, computational approaches remain inadequate and functional testing of gene variant is required to provide the highest level of evidence. The emergence of AlphaFold2 has transformed the field of protein structure determination, and here we outline a strategy that leverages predicted protein structure to enhance genetic variant classification. We used the gene IRF6 as a case study due to its clinical relevance, its critical role in cleft lip/palate malformation, and the availability of experimental data on the pathogenicity of IRF6 gene variants through phenotype rescue experiments in irf6-/- zebrafish. We compared results from over 30 pathogenicity prediction tools on 37 IRF6 missense variants. IRF6 lacks an experimentally derived structure, so we used predicted structures to explore associations between mutational clustering and pathogenicity. We found that among these variants, 19 of 37 were unanimously predicted as deleterious by computational tools. Comparing in silico predictions with experimental findings, 12 variants predicted as pathogenic were experimentally determined as benign. Even with the recently published AlphaMissense model, 15/18 (83%) of the predicted pathogenic variants were experimentally determined as benign. In comparison, mapping variants to the protein revealed deleterious mutation clusters around the protein binding domain, whereas N-terminal variants tend to be benign, suggesting the importance of structural information in determining pathogenicity of mutations in this gene. In conclusion, incorporating gene-specific structural features of known pathogenic/benign mutations may provide meaningful insights into pathogenicity predictions in a gene-specific manner and facilitate the interpretation of variant pathogenicity.
Collapse
Affiliation(s)
- Hemma Murali
- Graduate Program in Biochemistry and Molecular Biophysics, University of Pennsylvania, Philadelphia, PA 19104, United States
- Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
| | - Peng Wang
- Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
- Master of Biotechnology Program, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Eric C. Liao
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Center for Craniofacial Innovation, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
| | - Kai Wang
- Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, United States
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| |
Collapse
|
11
|
Xiong D, U K, Sun J, Cribbs AP. PLMC: Language Model of Protein Sequences Enhances Protein Crystallization Prediction. Interdiscip Sci 2024; 16:802-813. [PMID: 39155325 DOI: 10.1007/s12539-024-00639-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/13/2024] [Accepted: 05/21/2024] [Indexed: 08/20/2024]
Abstract
X-ray diffraction crystallography has been most widely used for protein three-dimensional (3D) structure determination for which whether proteins are crystallizable is a central prerequisite. Yet, there are a number of procedures during protein crystallization, including protein material production, purification, and crystal production, which take turns affecting the crystallization outcome. Due to the expensive and laborious nature of this multi-stage process, various computational tools have been developed to predict protein crystallization propensity, which is then used to guide the experimental determination. In this study, we presented a novel deep learning framework, PLMC, to improve multi-stage protein crystallization propensity prediction by leveraging a pre-trained protein language model. To effectively train PLMC, two groups of features of each protein were integrated into a more comprehensive representation, including protein language embeddings from the large-scale protein sequence database and a handcrafted feature set consisting of physicochemical, sequence-based and disordered-related information. These features were further separately embedded for refinement, and then concatenated for the final prediction. Notably, our extensive benchmarking tests demonstrate that PLMC greatly outperforms other state-of-the-art methods by achieving AUC scores of 0.773, 0.893, and 0.913, respectively, at the aforementioned individual stages, and 0.982 at the final crystallization stage. Furthermore, PLMC is shown to be superior for predicting the crystallization of both globular and membrane proteins, as demonstrated by an AUC score of 0.991 for the latter. These results suggest the significant potential of PLMC in assisting researchers with the experimental design of crystallizable protein variants.
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, USA.
| | - Kaicheng U
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA
| | - Jianfeng Sun
- Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, UK.
| | - Adam P Cribbs
- Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, UK
| |
Collapse
|
12
|
Iturralde AB, Weller CA, Giovanetti SM, Sadhu MJ. Comprehensive deletion scan of anti-CRISPR AcrIIA4 reveals essential and dispensable domains for Cas9 inhibition. Proc Natl Acad Sci U S A 2024; 121:e2413743121. [PMID: 39570312 PMCID: PMC11621469 DOI: 10.1073/pnas.2413743121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 10/17/2024] [Indexed: 11/22/2024] Open
Abstract
Delineating a protein's essential and dispensable domains provides critical insight into how it carries out its function. Here, we developed a high-throughput method to synthesize and test the functionality of all possible in-frame and continuous deletions in a gene of interest, enabling rapid and unbiased determination of protein domain importance. Our approach generates precise deletions using a CRISPR library framework that is free from constraints of gRNA target site availability and efficacy. We applied our method to AcrIIA4, a phage-encoded anti-CRISPR protein that robustly inhibits SpCas9. Extensive structural characterization has shown that AcrIIA4 physically occupies the DNA-binding interfaces of several SpCas9 domains; nonetheless, the importance of each AcrIIA4 interaction for SpCas9 inhibition is unknown. We used our approach to determine the essential and dispensable regions of AcrIIA4. Surprisingly, not all contacts with SpCas9 were required, and in particular, we found that the AcrIIA4 loop that inserts into SpCas9's RuvC catalytic domain can be deleted. Our results show that AcrIIA4 inhibits SpCas9 primarily by blocking PAM binding and that its interaction with the SpCas9 catalytic domain is inessential.
Collapse
Affiliation(s)
- Annette B. Iturralde
- Systems Biology and Genome Engineering Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, NIH, Bethesda, MD
| | - Cory A. Weller
- Systems Biology and Genome Engineering Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, NIH, Bethesda, MD
| | - Simone M. Giovanetti
- Systems Biology and Genome Engineering Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, NIH, Bethesda, MD
| | - Meru J. Sadhu
- Systems Biology and Genome Engineering Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, NIH, Bethesda, MD
| |
Collapse
|
13
|
Tzavella K, Diaz A, Olsen C, Vranken W. Combining evolution and protein language models for an interpretable cancer driver mutation prediction with D2Deep. Brief Bioinform 2024; 26:bbae664. [PMID: 39708841 DOI: 10.1093/bib/bbae664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/15/2024] [Accepted: 12/07/2024] [Indexed: 12/23/2024] Open
Abstract
The mutations driving cancer are being increasingly exposed through tumor-specific genomic data. However, differentiating between cancer-causing driver mutations and random passenger mutations remains challenging. State-of-the-art homology-based predictors contain built-in biases and are often ill-suited to the intricacies of cancer biology. Protein language models have successfully addressed various biological problems but have not yet been tested on the challenging task of cancer driver mutation prediction at a large scale. Additionally, they often fail to offer result interpretation, hindering their effective use in clinical settings. The AI-based D2Deep method we introduce here addresses these challenges by combining two powerful elements: (i) a nonspecialized protein language model that captures the makeup of all protein sequences and (ii) protein-specific evolutionary information that encompasses functional requirements for a particular protein. D2Deep relies exclusively on sequence information, outperforms state-of-the-art predictors, and captures intricate epistatic changes throughout the protein caused by mutations. These epistatic changes correlate with known mutations in the clinical setting and can be used for the interpretation of results. The model is trained on a balanced, somatic training set and so effectively mitigates biases related to hotspot mutations compared to state-of-the-art techniques. The versatility of D2Deep is illustrated by its performance on non-cancer mutation prediction, where most variants still lack known consequences. D2Deep predictions and confidence scores are available via https://tumorscope.be/d2deep to help with clinical interpretation and mutation prioritization.
Collapse
Affiliation(s)
- Konstantina Tzavella
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
| | - Adrian Diaz
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
| | - Catharina Olsen
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
- Brussels Interuniversity Genomics High Throughput Core (BRIGHTcore), Vrije Universiteit Brussel (VUB), Université Libre de Bruxelles (ULB), Laarbeeklaan 101, Brussels 1090, Belgium
- Clinical Sciences, Research Group Genetics, Reproduction and Development (GRAD), Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Laarbeeklaan 101, Brussels 1090, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Pleinlaan 2, Brussels 1050, Belgium
- Chemistry Department, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
- AI Lab, Vrije Universtiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
- Biomedical sciences, Vrije Universiteit Brussel, Laarbeeklaan 101, Brussels 1090, Belgium
| |
Collapse
|
14
|
Lessard S, Chao M, Reis K, Beauvais M, Rajpal DK, Sloane J, Palta P, Klinger K, de Rinaldis E, Shameer K, Chatelain C. Leveraging large-scale multi-omics evidences to identify therapeutic targets from genome-wide association studies. BMC Genomics 2024; 25:1111. [PMID: 39563277 DOI: 10.1186/s12864-024-10971-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 10/28/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND Therapeutic targets supported by genetic evidence from genome-wide association studies (GWAS) show higher probability of success in clinical trials. GWAS is a powerful approach to identify links between genetic variants and phenotypic variation; however, identifying the genes driving associations identified in GWAS remains challenging. Integration of molecular quantitative trait loci (molQTL) such as expression QTL (eQTL) using mendelian randomization (MR) and colocalization analyses can help with the identification of causal genes. Careful interpretation remains warranted because eQTL can affect the expression of multiple genes within the same locus. METHODS We used a combination of genomic features that include variant annotation, activity-by-contact maps, MR, and colocalization with molQTL to prioritize causal genes across 4,611 disease GWAS and meta-analyses from biobank studies, namely FinnGen, Estonian Biobank and UK Biobank. RESULTS Genes identified using this approach are enriched for gold standard causal genes and capture known biological links between disease genetics and biology. In addition, we find that eQTL colocalizing with GWAS are statistically enriched for corresponding disease-relevant tissues. We show that predicted directionality from MR is generally consistent with matched drug mechanism of actions (> 85% for approved drugs). Compared to the nearest gene mapping method, genes supported by multi-omics evidences displayed higher enrichment in approved therapeutic targets (risk ratio 1.75 vs. 2.58 for genes with the highest level of support). Finally, using this approach, we detected anassociation between the IL6 receptor signal transduction gene IL6ST and polymyalgia rheumatica, an indication for which sarilumab, a monoclonal antibody against IL-6, has been recently approved. CONCLUSIONS Combining variant annotation, activity-by-contact maps, and molQTL increases performance to identify causal genes, while informing on directionality which can be translated to successful target identification and drug development.
Collapse
Affiliation(s)
- Samuel Lessard
- Precision Medicine & Computational Biology, Sanofi, Cambridge, MA, USA
| | - Michael Chao
- Precision Medicine & Computational Biology, Sanofi, Cambridge, MA, USA
| | - Kadri Reis
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Mathieu Beauvais
- Digital R&D Data & Computational Sciences, Sanofi, Gentilly, France
| | - Deepak K Rajpal
- Translational Sciences, Sanofi, Framingham, MA, USA
- Pre-Clinical and Translational Sciences, Takeda, MA, USA
| | - Jennifer Sloane
- Immunology & Inflammation Development, Sanofi, Cambridge, MA, USA
| | - Priit Palta
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | | | | | - Khader Shameer
- Precision Medicine & Computational Biology, Sanofi, Cambridge, MA, USA
| | - Clément Chatelain
- Precision Medicine & Computational Biology, Sanofi, Cambridge, MA, USA.
| |
Collapse
|
15
|
Gracia B, Montes P, Huang M, Chen J, Karras GI. HSP90 buffers deleterious genetic variations in BRCA1. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.15.623783. [PMID: 39605638 PMCID: PMC11601394 DOI: 10.1101/2024.11.15.623783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Protein-folding chaperone HSP90 buffers genetic variation in diverse organisms, but the clinical significance of HSP90 buffering in disease remains unclear. Here, we show that HSP90 buffers mutations in the BRCT domain of BRCA1. HSP90-buffered BRCA1 mutations encode protein variants that retain interactions with partner proteins and rely on HSP90 for protein stability and function in cell survival. Moreover, HSP90-buffered BRCA1 variants confer PARP inhibitor resistance in cancer cell lines. Low-level HSP90 inhibition alleviates this resistance, revealing a cryptic and mutant-specific HSP90-contingent synthetic lethality. Hence, by stabilizing metastable variants across the entirety of the BRCT domain, HSP90 reduces the clinical severity of BRCA1 mutations allowing them to accumulate in populations. We estimate that HSP90 buffers 11% to 28% of known human BRCA1- BRCT missense mutations. Our work extends the clinical significance of HSP90 buffering to a prevalent class of variations in BRCA1 , pioneering its importance in cancer predisposition and therapy resistance.
Collapse
|
16
|
Ma K, Huang S, Ng KK, Lake NJ, Joseph S, Xu J, Lek A, Ge L, Woodman KG, Koczwara KE, Cohen J, Ho V, O'Connor CL, Brindley MA, Campbell KP, Lek M. Saturation mutagenesis-reinforced functional assays for disease-related genes. Cell 2024; 187:6707-6724.e22. [PMID: 39326416 PMCID: PMC11568926 DOI: 10.1016/j.cell.2024.08.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 07/29/2024] [Accepted: 08/23/2024] [Indexed: 09/28/2024]
Abstract
Interpretation of disease-causing genetic variants remains a challenge in human genetics. Current costs and complexity of deep mutational scanning methods are obstacles for achieving genome-wide resolution of variants in disease-related genes. Our framework, saturation mutagenesis-reinforced functional assays (SMuRF), offers simple and cost-effective saturation mutagenesis paired with streamlined functional assays to enhance the interpretation of unresolved variants. Applying SMuRF to neuromuscular disease genes FKRP and LARGE1, we generated functional scores for all possible coding single-nucleotide variants, which aid in resolving clinically reported variants of uncertain significance. SMuRF also demonstrates utility in predicting disease severity, resolving critical structural regions, and providing training datasets for the development of computational predictors. Overall, our approach enables variant-to-function insights for disease genes in a cost-effective manner that can be broadly implemented by standard research laboratories.
Collapse
Affiliation(s)
- Kaiyue Ma
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA; Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.
| | - Shushu Huang
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Kenneth K Ng
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Nicole J Lake
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Soumya Joseph
- Howard Hughes Medical Institute, Senator Paul D. Wellstone Muscular Dystrophy Specialized Research Center, Department of Molecular Physiology and Biophysics and Department of Neurology, Roy J. and Lucille A. Carver College of Medicine, The University of Iowa, Iowa City, IA, USA
| | - Jenny Xu
- Yale University, New Haven, CT, USA
| | - Angela Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA; Muscular Dystrophy Association, Chicago, IL, USA
| | - Lin Ge
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA; Department of Neurology, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Keryn G Woodman
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | - Justin Cohen
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Vincent Ho
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | - Melinda A Brindley
- Department of Infectious Diseases, Department of Population Health, University of Georgia, Athens, GA, USA
| | - Kevin P Campbell
- Howard Hughes Medical Institute, Senator Paul D. Wellstone Muscular Dystrophy Specialized Research Center, Department of Molecular Physiology and Biophysics and Department of Neurology, Roy J. and Lucille A. Carver College of Medicine, The University of Iowa, Iowa City, IA, USA
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
17
|
Xu J, Wang M, Ren Y, Luo W, Zhang L, Liu S, Hu P. A newly identified photosystem II Subunit P gene TaPsbP4A-1 in Triticeae species negatively regulates wheat powdery mildew resistance. FRONTIERS IN PLANT SCIENCE 2024; 15:1452281. [PMID: 39582632 PMCID: PMC11581894 DOI: 10.3389/fpls.2024.1452281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 10/14/2024] [Indexed: 11/26/2024]
Abstract
The photosystem II (PSII) Subunit P (PsbP) protein is a component of its oxygen-evolving complex, which can oxidize water to produce oxygen using light energy and is critical to the core components and stability of PSII. Using the whole-genome information, the PsbP genes of 10 plant species were comprehensively identified. The expression patterns of wheat PsbPs under Blumeria graminis f. sp. tritici (Bgt) infection were assessed using qRT-PCR, and the functions of TaPsbPs in wheat powdery mildew resistance were studied using barley stripe mosaic virus-induced gene silencing. In total, 122 PsbP genes were divided into 8 classes with similar gene structures. No tandem repeat events were identified in wheat PsbP, suggesting that the PsbP genes in common wheat were donated by its diploid progenitor species. The expression levels of TaPsbP2A-1, TaPsbP3A-1, TaPsbP4A-1, TaPsbP4A-2, and TaPsbP7A-2 were induced by Bgt. The silencing of TaPsbP4A-1 increased the resistance of common wheat 'Bainong AK58' to Bgt. This study provides valuable information for functional and evolutionary research on the PsbP gene family.
Collapse
Affiliation(s)
- Jun Xu
- College of Horticulture and Landscape Architecture, Henan Institute of Science and Technology, Xinxiang, China
| | - Mengfei Wang
- College of Agriculture, Henan Engineering Research Center of Crop Genome Editing/Henan International Joint Laboratory of Plant Genetic Improvement and Soil Remediation, Henan Institute of Science and Technology, Xinxiang, China
| | - Yueming Ren
- College of Agriculture, Henan Engineering Research Center of Crop Genome Editing/Henan International Joint Laboratory of Plant Genetic Improvement and Soil Remediation, Henan Institute of Science and Technology, Xinxiang, China
| | - Wanglong Luo
- College of Agriculture, Henan Engineering Research Center of Crop Genome Editing/Henan International Joint Laboratory of Plant Genetic Improvement and Soil Remediation, Henan Institute of Science and Technology, Xinxiang, China
| | - Lu Zhang
- College of Agriculture, Henan Engineering Research Center of Crop Genome Editing/Henan International Joint Laboratory of Plant Genetic Improvement and Soil Remediation, Henan Institute of Science and Technology, Xinxiang, China
| | - Shuangwei Liu
- College of Horticulture and Landscape Architecture, Henan Institute of Science and Technology, Xinxiang, China
| | - Ping Hu
- College of Agriculture, Henan Engineering Research Center of Crop Genome Editing/Henan International Joint Laboratory of Plant Genetic Improvement and Soil Remediation, Henan Institute of Science and Technology, Xinxiang, China
| |
Collapse
|
18
|
Donis R, Patel KA, Wakeling MN, Johnson MB, Amoli MM, Yildiz M, Akçay T, Aspi I, Yong J, Yaghootkar H, Weedon MN, Hattersley AT, Flanagan SE, De Franco E. A homozygous TARS2 variant is a novel cause of syndromic neonatal diabetes. Diabet Med 2024:e15471. [PMID: 39509107 DOI: 10.1111/dme.15471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 10/21/2024] [Accepted: 10/23/2024] [Indexed: 11/15/2024]
Abstract
AIMS Neonatal diabetes is a monogenic condition which can be the presenting feature of complex syndromes. The aim of this study was to identify novel genetic causes of neonatal diabetes with neurological features including developmental delay and epilepsy. METHODS We performed genome sequencing in 27 individuals with neonatal diabetes plus epilepsy and/or developmental delay of unknown genetic cause. Replication studies were performed in 123 individuals with diabetes diagnosed aged ≤1 year without a known genetic cause using targeted next-generation sequencing. RESULTS Three individuals, all diagnosed with diabetes in the first week of life, shared a rare homozygous missense variant, p.(Arg327Gln), in TARS2. Replication studies identified the same homozygous variant in a fourth individual diagnosed with diabetes at 1 year. One proband had epilepsy, one had development delay and two had both. Biallelic TARS2 variants cause a mitochondrial encephalopathy (COXPD-21) characterised by severe hypotonia, epilepsy and developmental delay. Diabetes is not a known feature of COXPD-21. Current evidence suggests that the p.(Arg327Gln) variant disrupts TARS2's regulation of the mTORC1 pathway which is essential for β-cells. CONCLUSIONS Our findings establish the homozygous p.(Arg327Gln) TARS2 variant as a novel cause of syndromic neonatal diabetes and uncover a role for TARS2 in pancreatic β-cells.
Collapse
Affiliation(s)
- Russell Donis
- Department of Clinical and Biomedical Science, University of Exeter Faculty of Health and Life Sciences, Exeter, UK
| | - Kashyap A Patel
- Department of Clinical and Biomedical Science, University of Exeter Faculty of Health and Life Sciences, Exeter, UK
| | - Matthew N Wakeling
- Department of Clinical and Biomedical Science, University of Exeter Faculty of Health and Life Sciences, Exeter, UK
| | - Matthew B Johnson
- Department of Clinical and Biomedical Science, University of Exeter Faculty of Health and Life Sciences, Exeter, UK
| | - Masha M Amoli
- Metabolic Disorders Research Centre, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Melek Yildiz
- Department of Paediatric Endocrinology, İstanbul University, İstanbul Faculty of Medicine, İstanbul, Turkey
| | - Teoman Akçay
- Department of Paediatric Endocrinology, Bakırköy Dr. Sadi Konuk Education and Research Hospital, İstanbul, Turkey
| | - Irani Aspi
- Nanavati Super Speciality Hospital, Mumbai, India
- Juvenile Diabetes Foundation, Maharashtra Chapter, Mumbai, India
| | - James Yong
- Children and Young People's Diabetes Team, St James's University Hospital, Leeds, UK
| | - Hanieh Yaghootkar
- College of Health and Science, University of Lincoln, Joseph Banks Laboratories, Lincoln, UK
| | - Michael N Weedon
- Department of Clinical and Biomedical Science, University of Exeter Faculty of Health and Life Sciences, Exeter, UK
| | - Andrew T Hattersley
- Department of Clinical and Biomedical Science, University of Exeter Faculty of Health and Life Sciences, Exeter, UK
| | - Sarah E Flanagan
- Department of Clinical and Biomedical Science, University of Exeter Faculty of Health and Life Sciences, Exeter, UK
| | - Elisa De Franco
- Department of Clinical and Biomedical Science, University of Exeter Faculty of Health and Life Sciences, Exeter, UK
| |
Collapse
|
19
|
Fawzy M, Marsh JA. Understanding the heterogeneous performance of variant effect predictors across human protein-coding genes. Sci Rep 2024; 14:26114. [PMID: 39478110 PMCID: PMC11526010 DOI: 10.1038/s41598-024-76202-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 10/11/2024] [Indexed: 11/02/2024] Open
Abstract
Variant effect predictors (VEPs) are computational tools developed to assess the impacts of genetic mutations, often in terms of likely pathogenicity, employing diverse algorithms and training data. Here, we investigate the performance of 35 VEPs in the discrimination between pathogenic and putatively benign missense variants across 963 human protein-coding genes. We observe considerable gene-level heterogeneity as measured by the widely used area under the receiver operating characteristic curve (AUROC) metric. To investigate the origins of this heterogeneity and the extent to which gene-level VEP performance is predictable, for each VEP, we train random forest models to predict the gene-level AUROC. We find that performance as measured by AUROC is related to factors such as gene function, protein structure, and evolutionary conservation. Notably, intrinsic disorder in proteins emerged as a significant factor influencing apparent VEP performance, often leading to inflated AUROC values due to their enrichment in weakly conserved putatively benign variants. Our results suggest that gene-level features may be useful for identifying genes where VEP predictions are likely to be more or less reliable. However, our work also shows that AUROC, despite being independent of class balance, still has crucial limitations when used for comparing VEP performance across different genes.
Collapse
Affiliation(s)
- Mohamed Fawzy
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
20
|
Shepherdson JL, Granas DM, Li J, Shariff Z, Plassmeyer SP, Holehouse AS, White MA, Cohen BA. Mutational scanning of CRX classifies clinical variants and reveals biochemical properties of the transcriptional effector domain. Genome Res 2024; 34:1540-1552. [PMID: 39322280 PMCID: PMC11529990 DOI: 10.1101/gr.279415.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 09/11/2024] [Indexed: 09/27/2024]
Abstract
The transcription factor (TF) cone-rod homeobox (CRX) is essential for the differentiation and maintenance of photoreceptor cell identity. Several human CRX variants cause degenerative retinopathies, but most are variants of uncertain significance. We performed a deep mutational scan (DMS) of nearly all possible single amino acid substitutions in CRX using a cell-based transcriptional reporter assay, curating a high-confidence list of nearly 2000 variants with altered transcriptional activity. In the structured homeodomain, activity scores closely aligned to a predicted structure and demonstrated position-specific constraints on amino acid substitution. In contrast, the intrinsically disordered transcriptional effector domain displayed a qualitatively different pattern of substitution effects, following compositional constraints without specific residue position requirements in the peptide chain. These compositional constraints were consistent with the acidic exposure model of transcriptional activation. We evaluated the performance of the DMS assay as a clinical variant classification tool using gold-standard classified human variants from ClinVar, identifying pathogenic variants with high specificity and moderate sensitivity. That this performance could be achieved using a synthetic reporter assay in a foreign cell type, even for a highly cell type-specific TF like CRX, suggests that this approach shows promise for DMS of other TFs that function in cell types that are not easily accessible. Together, the results of the CRX DMS identify molecular features of the CRX effector domain and demonstrate utility for integration into the clinical variant classification pipeline.
Collapse
Affiliation(s)
- James L Shepherdson
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - David M Granas
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Jie Li
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Zara Shariff
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Stephen P Plassmeyer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Center for Biomolecular Condensates, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Center for Biomolecular Condensates, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Michael A White
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Barak A Cohen
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA;
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| |
Collapse
|
21
|
Han T, Nebelung S, Khader F, Wang T, Müller-Franzes G, Kuhl C, Försch S, Kleesiek J, Haarburger C, Bressem KK, Kather JN, Truhn D. Medical large language models are susceptible to targeted misinformation attacks. NPJ Digit Med 2024; 7:288. [PMID: 39443664 PMCID: PMC11499642 DOI: 10.1038/s41746-024-01282-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Accepted: 10/02/2024] [Indexed: 10/25/2024] Open
Abstract
Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains, holding promising potential for diverse medical applications in the near future. In this study, we demonstrate a concerning vulnerability of LLMs in medicine. Through targeted manipulation of just 1.1% of the weights of the LLM, we can deliberately inject incorrect biomedical facts. The erroneous information is then propagated in the model's output while maintaining performance on other biomedical tasks. We validate our findings in a set of 1025 incorrect biomedical facts. This peculiar susceptibility raises serious security and trustworthiness concerns for the application of LLMs in healthcare settings. It accentuates the need for robust protective measures, thorough verification mechanisms, and stringent management of access to these models, ensuring their reliable and safe use in medical practice.
Collapse
Affiliation(s)
- Tianyu Han
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany.
| | - Sven Nebelung
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Firas Khader
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Tianci Wang
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Gustav Müller-Franzes
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Christiane Kuhl
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Sebastian Försch
- Institute of Pathology, University Medical Center of the Johannes Gutenberg-University, Mainz, Germany
| | - Jens Kleesiek
- Institute for AI in Medicine, University Medicine Essen, Essen, Germany
| | | | - Keno K Bressem
- Department of Radiology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health (EKFZ), Technical University Dresden, Dresden, Germany
- Department of Medicine I, University Hospital Dresden, Dresden, Germany
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany.
| |
Collapse
|
22
|
Chow RD, Nathanson KL, Parikh RB. Phenotypic evaluation of deep learning models for classifying germline variant pathogenicity. NPJ Precis Oncol 2024; 8:235. [PMID: 39427061 PMCID: PMC11490490 DOI: 10.1038/s41698-024-00710-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 09/16/2024] [Indexed: 10/21/2024] Open
Abstract
Deep learning models for predicting variant pathogenicity have not been thoroughly evaluated on real-world clinical phenotypes. Here, we apply state-of-the-art pathogenicity prediction models to hereditary breast cancer gene variants in UK Biobank participants. Model predictions for missense variants in BRCA1, BRCA2 and PALB2, but not ATM and CHEK2, were associated with breast cancer risk. However, deep learning models had limited clinical utility when specifically applied to variants of uncertain significance.
Collapse
Affiliation(s)
- Ryan D Chow
- Department of Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.
| | - Katherine L Nathanson
- Basser Center for BRCA, Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ravi B Parikh
- Division of Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Penn Center for Cancer Care Innovation, Abramson Cancer Center, Philadelphia, PA, USA
- Division of Hematology and Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
| |
Collapse
|
23
|
Hou C, Shen Y. SeqDance: A Protein Language Model for Representing Protein Dynamic Properties. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.11.617911. [PMID: 39464109 PMCID: PMC11507661 DOI: 10.1101/2024.10.11.617911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Proteins perform their functions by folding amino acid sequences into dynamic structural ensembles. Despite the important role of protein dynamics, their complexity and the absence of efficient representation methods have limited their integration into studies on protein function and mutation fitness, especially in deep learning applications. To address this, we present SeqDance, a protein language model designed to learn representation of protein dynamic properties directly from sequence alone. SeqDance is pre-trained on dynamic biophysical properties derived from over 30,400 molecular dynamics trajectories and 28,600 normal mode analyses. Our results show that SeqDance effectively captures local dynamic interactions, co-movement patterns, and global conformational features, even for proteins lacking homologs in the pre-training set. Additionally, we showed that SeqDance enhances the prediction of protein fitness landscapes, disorder-to-order transition binding regions, and phase-separating proteins. By learning dynamic properties from sequence, SeqDance complements conventional evolution- and static structure-based methods, offering new insights into protein behavior and function.
Collapse
Affiliation(s)
- Chao Hou
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY 10032
| |
Collapse
|
24
|
Schnur RE, Dvořáček L, Kalsner L, Shapiro FL, Grebeňová D, Yanni D, Wasserman BN, Dyer LM, Antonarakis SE, Kuželová K. New kinase-deficient PAK2 variants associated with Knobloch syndrome type 2. Clin Genet 2024; 106:518-524. [PMID: 38894571 DOI: 10.1111/cge.14578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/04/2024] [Accepted: 06/08/2024] [Indexed: 06/21/2024]
Abstract
The p21-activated kinase (PAK) family of proteins regulates various processes requiring dynamic cytoskeleton organization such as cell adhesion, migration, proliferation, and apoptosis. Among the six members of the protein family, PAK2 is specifically involved in apoptosis, angiogenesis, or the development of endothelial cells. We report a novel de novo heterozygous missense PAK2 variant, p.(Thr406Met), found in a newborn with clinical manifestations of Knobloch syndrome. In vitro experiments indicated that this and another reported variant, p.(Asp425Asn), result in substantially impaired protein kinase activity. Similar findings were described previously for the PAK2 p.(Glu435Lys) variant found in two siblings with proposed Knobloch syndrome type 2 (KNO2). These new variants support the association of PAK2 kinase deficiency with a second, autosomal dominant form of Knobloch syndrome: KNO2.
Collapse
Affiliation(s)
- Rhonda E Schnur
- Cooper Medical School of Rowan University, Camden, New Jersey, USA
- Division of Genetics, Cooper University Healthcare, Camden, New Jersey, USA
| | - Lukáš Dvořáček
- Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| | - Louisa Kalsner
- Departments of Neurology and Pediatrics, Genetics Division, University of Connecticut School of Medicine, Connecticut Children's Medical Center, Hartford, Connecticut, USA
| | - Faye L Shapiro
- Division of Genetics, Cooper University Healthcare, Camden, New Jersey, USA
| | - Dana Grebeňová
- Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| | - Diana Yanni
- Division of Neonatology, Cooper University Healthcare, Camden, New Jersey, USA
| | - Barry N Wasserman
- Division of Neonatology, Cooper University Healthcare, Camden, New Jersey, USA
- Wills Eye Hospital, Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, Pennsylvania, USA
| | | | | | - Kateřina Kuželová
- Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| |
Collapse
|
25
|
Ye D, Garmany R, Martinez-Barrios E, Gao X, Neves RAL, Tester DJ, Bains S, Zhou W, Giudicessi JR, Ackerman MJ. Clinical Utility of Protein Language Models in Resolution of Variants of Uncertain Significance in KCNQ1, KCNH2, and SCN5A Compared With Patch-Clamp Functional Characterization. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2024; 17:e004584. [PMID: 39119706 DOI: 10.1161/circgen.124.004584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 07/08/2024] [Indexed: 08/10/2024]
Abstract
BACKGROUND Genetic testing for cardiac channelopathies is the standard of care. However, many rare genetic variants remain classified as variants of uncertain significance (VUS) due to lack of epidemiological and functional data. Whether deep protein language models may aid in VUS resolution remains unknown. Here, we set out to compare how 2 deep protein language models perform at VUS resolution in the 3 most common long-QT syndrome-causative genes compared with the gold-standard patch clamp. METHODS A total of 72 rare nonsynonymous VUS (9 KCNQ1, 19 KCNH2, and 50 SCN5A) were engineered by site-directed mutagenesis and expressed in either HEK293 cells or TSA201 cells. Whole-cell patch-clamp technique was used to functionally characterize these variants. The protein language models, evolutionary scale modeling, version 1b and AlphaMissense, were used to predict the variant effect of missense variants and compared with patch clamp. RESULTS Considering variants in all 3 genes, the evolutionary scale modeling, version 1b model had a receiver operating characteristic curve-area under the curve of 0.75 (P=0.0003). It had a sensitivity of 88% and a specificity of 50%. AlphaMissense performed well compared with patch-clamp with an receiver operating characteristic curve-area under the curve of 0.85 (P<0.0001), sensitivity of 80%, and specificity of 76%. CONCLUSIONS Deep protein language models aid in VUS resolution with high sensitivity but lower specificity. Thus, these tools cannot fully replace functional characterization but can aid in reducing the number of variants that may require functional analysis.
Collapse
Affiliation(s)
- Dan Ye
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - Ramin Garmany
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - Estefania Martinez-Barrios
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - Xiaozhi Gao
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - Raquel Almeida Lopes Neves
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - David J Tester
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - Sahej Bains
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - Wei Zhou
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - John R Giudicessi
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| | - Michael J Ackerman
- Department of Molecular Pharmacology and Experimental Therapeutics (Windland Smith Rice Sudden Death Genomics Laboratory). Department of Cardiovascular Medicine, Division of Heart Rhythm Services (Windland Smith Rice Genetic Heart Rhythm Clinic). Department of Pediatric and Adolescent Medicine, Division of Pediatric Cardiology, Mayo Clinic
| |
Collapse
|
26
|
Pan Q, Parra GB, Myung Y, Portelli S, Nguyen TB, Ascher DB. AlzDiscovery: A computational tool to identify Alzheimer's disease-causing missense mutations using protein structure information. Protein Sci 2024; 33:e5147. [PMID: 39276018 PMCID: PMC11401060 DOI: 10.1002/pro.5147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/14/2024] [Accepted: 07/31/2024] [Indexed: 09/16/2024]
Abstract
Alzheimer's disease (AD) is one of the most common forms of dementia and neurodegenerative diseases, characterized by the formation of neuritic plaques and neurofibrillary tangles. Many different proteins participate in this complicated pathogenic mechanism, and missense mutations can alter the folding and functions of these proteins, significantly increasing the risk of AD. However, many methods to identify AD-causing variants did not consider the effect of mutations from the perspective of a protein three-dimensional environment. Here, we present a machine learning-based analysis to classify the AD-causing mutations from their benign counterparts in 21 AD-related proteins leveraging both sequence- and structure-based features. Using computational tools to estimate the effect of mutations on protein stability, we first observed a bias of the pathogenic mutations with significant destabilizing effects on family AD-related proteins. Combining this insight, we built a generic predictive model, and improved the performance by tuning the sample weights in the training process. Our final model achieved the performance on area under the receiver operating characteristic curve up to 0.95 in the blind test and 0.70 in an independent clinical validation, outperforming all the state-of-the-art methods. Feature interpretation indicated that the hydrophobic environment and polar interaction contacts were crucial to the decision on pathogenic phenotypes of missense mutations. Finally, we presented a user-friendly web server, AlzDiscovery, for researchers to browse the predicted phenotypes of all possible missense mutations on these 21 AD-related proteins. Our study will be a valuable resource for AD screening and the development of personalized treatment.
Collapse
Affiliation(s)
- Qisheng Pan
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Georgina Becerra Parra
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Yoochan Myung
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Stephanie Portelli
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Thanh Binh Nguyen
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - David B. Ascher
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| |
Collapse
|
27
|
Feng M, Wei X, Zheng X, Liu L, Lin L, Xia M, He G, Shi Y, Lu Q. Decoding Missense Variants by Incorporating Phase Separation via Machine Learning. Nat Commun 2024; 15:8279. [PMID: 39333476 PMCID: PMC11436885 DOI: 10.1038/s41467-024-52580-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 09/12/2024] [Indexed: 09/29/2024] Open
Abstract
Computational models have made significant progress in predicting the effect of protein variants. However, deciphering numerous variants of uncertain significance (VUS) located within intrinsically disordered regions (IDRs) remains challenging. To address this issue, we introduce phase separation, which is tightly linked to IDRs, into the investigation of missense variants. Phase separation is vital for multiple physiological processes. By leveraging missense variants that alter phase separation propensity, we develop a machine learning approach named PSMutPred to predict the impact of missense mutations on phase separation. PSMutPred demonstrates robust performance in predicting missense variants that affect natural phase separation. In vitro experiments further underscore its validity. By applying PSMutPred on over 522,000 ClinVar missense variants, it significantly contributes to decoding the pathogenesis of disease variants, especially those in IDRs. Our work provides insights into the understanding of a vast number of VUSs in IDRs, expediting clinical interpretation and diagnosis.
Collapse
Affiliation(s)
- Mofan Feng
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China
- The Collaborative Innovation Center for Brain Science, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaoxi Wei
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China
| | - Xi Zheng
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China
- The Collaborative Innovation Center for Brain Science, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China
| | - Liangjie Liu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China
- The Collaborative Innovation Center for Brain Science, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China
| | - Lin Lin
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China
| | - Manying Xia
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China
| | - Guang He
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China.
- The Collaborative Innovation Center for Brain Science, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China.
| | - Yi Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China.
- The Collaborative Innovation Center for Brain Science, and Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China.
| | - Qing Lu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China.
- Department of Otorhinolaryngology-Head and Neck Surgery, Chongqing General Hospital, Chongqing, China.
- Ear Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
- Shanghai Key Laboratory of Translational Medicine on Ear and Nose Diseases, Shanghai, China.
| |
Collapse
|
28
|
Fries LE, Dharma S, Chakravarti A, Chatterjee S. Variability in proliferative and migratory defects in Hirschsprung disease-associated RET pathogenic variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614825. [PMID: 39372753 PMCID: PMC11451626 DOI: 10.1101/2024.09.24.614825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Despite the extensive genetic heterogeneity of Hirschsprung disease (HSCR; congenital colonic aganglionosis) 72% of patients harbor pathogenic variants in 10 genes that form a gene regulatory network (GRN) controlling the development of the enteric nervous system (ENS). Among these genes, the receptor tyrosine kinase gene RET is the most significant contributor, accounting for pathogenic variants in 12%-50% of patients depending on phenotype. RET plays a critical role in the proliferation and migration of ENS precursors, and defects in these processes lead to HSCR. However, despite the gene's importance in HSCR, the functional consequences of RET pathogenic variants and their mechanism of disease remain poorly understood. To address this, we investigated the proliferative and migratory phenotypes in a RET-dependent neural crest-derived cell line harboring one of five missense (L56M, E178Q, Y791F, S922Y, F998L) or three nonsense (Y204X, R770X, Y981X) pathogenic heterozygous variants. Using a combination of cDNA-based and CRISPR-based PRIME editing coupled with quantitative proliferation and migration assays, we detected significant losses in cell proliferation and migration in three missense (E178Q, S922Y, F998L) and all nonsense variants. Our data suggests that the Y791F variant, whose pathogenicity has been debated, is likely not pathogenic. Importantly, the severity of migration loss did not consistently correlate with proliferation defects, and the phenotypic severity of nonsense variants was independent of their position within the RET protein. This study highlights the necessity and feasibility of targeted functional assays to accurately assess the pathogenicity of HSCR-associated variants, rather than relying solely on machine learning predictions, which could themselves be refined by incorporating such functional data.
Collapse
Affiliation(s)
- Lauren E Fries
- Center for Human Genetics & Genomics, New York University Grossman School of Medicine, New York, NY 10016
| | - Sree Dharma
- Center for Human Genetics & Genomics, New York University Grossman School of Medicine, New York, NY 10016
| | - Aravinda Chakravarti
- Center for Human Genetics & Genomics, New York University Grossman School of Medicine, New York, NY 10016
- Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY 10016
| | - Sumantra Chatterjee
- Center for Human Genetics & Genomics, New York University Grossman School of Medicine, New York, NY 10016
- Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY 10016
| |
Collapse
|
29
|
Tang Z, Somia N, Yu Y, Koo PK. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.582810. [PMID: 38464101 PMCID: PMC10925287 DOI: 10.1101/2024.02.29.582810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged to improve predictive performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question. Here we evaluate the representational power of pre-trained gLMs to predict and interpret cell-type-specific functional genomics data that span DNA and RNA regulation. Our findings suggest that probing the representations of pre-trained gLMs do not offer substantial advantages over conventional machine learning approaches that use one-hot encoded sequences. This work highlights a major gap with current gLMs, raising potential issues in conventional pre-training strategies for the non-coding genome.
Collapse
Affiliation(s)
- Ziqi Tang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Nirali Somia
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Yiyang Yu
- The Fu Foundation School of Engineering and Applied Science, Columbia University, New York, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| |
Collapse
|
30
|
Kimura H, Lahouel K, Tomasetti C, Roberts NJ. Functional characterization of all CDKN2A missense variants and comparison to in silico models of pathogenicity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.28.573507. [PMID: 38234851 PMCID: PMC10793438 DOI: 10.1101/2023.12.28.573507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Interpretation of variants identified during genetic testing is a significant clinical challenge. In this study, we developed a high-throughput CDKN2A functional assay and characterized all possible CDKN2A missense variants. We found that 17.7% of all missense variants were functionally deleterious. We also used our functional classifications to assess the performance of in silico models that predict the effect of variants, including recently reported models based on machine learning. Notably, we found that all in silico models performed similarly when compared to our functional classifications with accuracies of 39.5-85.4%. Furthermore, while we found that functionally deleterious variants were enriched within ankyrin repeats, we did not identify any residues where all missense variants were functionally deleterious. Our functional classifications are a resource to aid the interpretation of CDKN2A variants and have important implications for the application of variant interpretation guidelines, particularly the use of in silico models for clinical variant interpretation.
Collapse
Affiliation(s)
- Hirokazu Kimura
- Department of Pathology, the Johns Hopkins University School of Medicine; Baltimore, 21287, USA
| | - Kamel Lahouel
- Division of Integrated Genomics, Translational Genomics Research Institute; Phoenix, 85004, USA
- Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope; Duarte, 91010, USA
| | - Cristian Tomasetti
- Division of Integrated Genomics, Translational Genomics Research Institute; Phoenix, 85004, USA
- Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope; Duarte, 91010, USA
| | - Nicholas J. Roberts
- Department of Pathology, the Johns Hopkins University School of Medicine; Baltimore, 21287, USA
- Department of Oncology, the Johns Hopkins University School of Medicine; Baltimore, 21287, USA
| |
Collapse
|
31
|
Benegas G, Ye C, Albors C, Li JC, Song YS. Genomic Language Models: Opportunities and Challenges. ARXIV 2024:arXiv:2407.11435v2. [PMID: 39070037 PMCID: PMC11275703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and evaluating gLMs.
Collapse
Affiliation(s)
- Gonzalo Benegas
- Computer Science Division, University of California, Berkeley
| | - Chengzhong Ye
- Department of Statistics, University of California, Berkeley
| | - Carlos Albors
- Computer Science Division, University of California, Berkeley
| | - Jianan Canal Li
- Computer Science Division, University of California, Berkeley
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
| |
Collapse
|
32
|
Bergquist T, Stenton SL, Nadeau EA, Byrne AB, Greenblatt MS, Harrison SM, Tavtigian SV, O'Donnell-Luria A, Biesecker LG, Radivojac P, Brenner SE, Pejaver V. Calibration of additional computational tools expands ClinGen recommendation options for variant classification with PP3/BP4 criteria. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.17.611902. [PMID: 39345488 PMCID: PMC11429929 DOI: 10.1101/2024.09.17.611902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Purpose We previously developed an approach to calibrate computational tools for clinical variant classification, updating recommendations for the reliable use of variant impact predictors to provide evidence strength up to Strong. A new generation of tools using distinctive approaches have since been released, and these methods must be independently calibrated for clinical application. Method Using our local posterior probability-based calibration and our established data set of ClinVar pathogenic and benign variants, we determined the strength of evidence provided by three new tools (AlphaMissense, ESM1b, VARITY) and calibrated scores meeting each evidence strength. Results All three tools reached the Strong level of evidence for variant pathogenicity and Moderate for benignity, though sometimes for few variants. Compared to previously recommended tools, these yielded at best only modest improvements in the tradeoffs of evidence strength and false positive predictions. Conclusion At calibrated thresholds, three new computational predictors provided evidence for variant pathogenicity at similar strength to the four previously recommended predictors (and comparable with functional assays for some variants). This calibration broadens the scope of computational tools for application in clinical variant classification. Their new approaches offer promise for future advancement of the field.
Collapse
Affiliation(s)
- Timothy Bergquist
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sarah L. Stenton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Emily A.W. Nadeau
- Department of Medicine and University of Vermont Cancer Center, University of Vermont, Larner College of Medicine, Burlington, VT 05405, USA
| | - Alicia B. Byrne
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Marc S. Greenblatt
- Department of Medicine and University of Vermont Cancer Center, University of Vermont, Larner College of Medicine, Burlington, VT 05405, USA
| | - Steven M. Harrison
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Ambry Genetics, Aliso Viejo, CA 92656, USA
| | - Sean V. Tavtigian
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Leslie G. Biesecker
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | |
Collapse
|
33
|
Bhattacharya S, Saleem SM, Singh A, Singh S, Tripathi S. Empowering precision medicine: regenerative AI in breast cancer. Front Oncol 2024; 14:1465720. [PMID: 39372870 PMCID: PMC11449872 DOI: 10.3389/fonc.2024.1465720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 08/27/2024] [Indexed: 10/08/2024] Open
Abstract
Regenerative AI is transforming breast cancer diagnosis and treatment through enhanced imaging analysis, personalized medicine, drug discovery, and remote patient monitoring. AI algorithms can detect subtle patterns in mammograms and other imaging modalities with high accuracy, potentially leading to earlier diagnoses. In treatment planning, AI integrates patient-specific data to predict individual responses and optimize therapies. For drug discovery, generative AI models rapidly design and screen novel molecules targeting breast cancer pathways. Remote monitoring tools powered by AI provide real-time insights to guide care. Examples include Google's LYNA for analyzing pathology slides, Kheiron's Mia for mammogram interpretation, and Tempus's platform for integrating clinical and genomic data. While promising, challenges remain, including limited high-quality training data, integration into clinical workflows, interpretability of AI decisions, and regulatory/ethical concerns. Strategies to address these include collaborative data-sharing initiatives, user-centered design, explainable AI techniques, and robust oversight frameworks. In developing countries, AI tools like MammoAssist and Niramai's thermal imaging system are improving access to screening. Overall, regenerative AI offers significant potential to enhance breast cancer care, but judicious implementation with awareness of limitations is crucial. Coordinated efforts across the healthcare ecosystem are needed to fully realize AI's benefits while addressing challenges.
Collapse
Affiliation(s)
- Sudip Bhattacharya
- Department of Community and Family Medicine, All India Institute of Medical Sciences, (AIIMS Deoghar), Deoghar, India
| | - Sheikh Mohd Saleem
- Department of Health and Family Welfare, EVTHS, UNICEF, New Delhi, India
| | - Alok Singh
- Faculty of Medicine and Health Sciences, Shree Guru Gobind Singh Tricentenary University, Gurugram, Haryana, India
| | - Sukhpreet Singh
- Department of Health and Family Welfare, Haryana Civil Medical Services (HCMS), Panchkula, Haryana, India
| | - Shailesh Tripathi
- Department of Hospital Administration, Rajendra Institute of Medical Sciences, Ranchi, Jharkhand, India
| |
Collapse
|
34
|
Calame DG, Wong JH, Panda P, Nguyen DT, Leong NCP, Sangermano R, Patankar SG, Abdel-Hamid MS, AlAbdi L, Safwat S, Flannery KP, Dardas Z, Fatih JM, Murali C, Kannan V, Lotze TE, Herman I, Ammouri F, Rezich B, Efthymiou S, Alavi S, Murphy D, Firoozfar Z, Nasab ME, Bahreini A, Ghasemi M, Haridy NA, Goldouzi HR, Eghbal F, Karimiani EG, Begtrup A, Elloumi H, Srinivasan VM, Gowda VK, Du H, Jhangiani SN, Coban-Akdemir Z, Marafi D, Rodan L, Isikay S, Rosenfeld JA, Ramanathan S, Staton M, Oberg KC, Clark RD, Wenman C, Loughlin S, Saad R, Ashraf T, Male A, Tadros S, Boostani R, Abdel-Salam GMH, Zaki M, Mardi A, Hashemi-Gorji F, Abdalla E, Manzini MC, Pehlivan D, Posey JE, Gibbs RA, Houlden H, Alkuraya FS, Bujakowska K, Maroofian R, Lupski JR, Nguyen LN. Biallelic variation in the choline and ethanolamine transporter FLVCR1 underlies a severe developmental disorder spectrum. Genet Med 2024:101273. [PMID: 39306721 DOI: 10.1016/j.gim.2024.101273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 09/13/2024] [Accepted: 09/13/2024] [Indexed: 09/25/2024] Open
Abstract
PURPOSE FLVCR1 encodes a solute carrier protein implicated in heme, choline, and ethanolamine transport. Although Flvcr1-/- mice exhibit skeletal malformations and defective erythropoiesis reminiscent of Diamond-Blackfan anemia (DBA), biallelic FLVCR1 variants in humans have previously only been linked to childhood or adult-onset ataxia, sensory neuropathy, and retinitis pigmentosa. METHODS We identified individuals with undiagnosed neurodevelopmental disorders and biallelic FLVCR1 variants through international data sharing and characterized the functional consequences of their FLVCR1 variants. RESULTS We ascertained 30 patients from 23 unrelated families with biallelic FLVCR1 variants and characterized a novel FLVCR1-related phenotype: severe developmental disorders with profound developmental delay, microcephaly (z-score -2.5 to -10.5), brain malformations, epilepsy, spasticity, and premature death. Brain malformations ranged from mild brain volume reduction to hydranencephaly. Severely affected patients share traits, including macrocytic anemia and skeletal malformations, with Flvcr1-/- mice and DBA. FLVCR1 variants significantly reduce choline and ethanolamine transport and/or disrupt mRNA splicing. CONCLUSION These data demonstrate a broad FLVCR1-related phenotypic spectrum ranging from severe multiorgan developmental disorders resembling DBA to adult-onset neurodegeneration. Our study expands our understanding of Mendelian choline and ethanolamine disorders and illustrates the importance of anticipating a wide phenotypic spectrum for known disease genes and incorporating model organism data into genome analysis to maximize genetic testing yield.
Collapse
Affiliation(s)
- Daniel G Calame
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX; Texas Children's Hospital, Houston, TX; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX.
| | - Jovi Huixin Wong
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Puravi Panda
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Dat Tuan Nguyen
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Nancy C P Leong
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Riccardo Sangermano
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Sohil G Patankar
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Mohamed S Abdel-Hamid
- Medical Molecular Genetics Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo, Egypt
| | - Lama AlAbdi
- Department of Translational Genomics, Center for Genomic Medicine, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Sylvia Safwat
- Department of Human Genetics, Medical Research Institute, Alexandria University, Alexandria, Egypt; Department of Neuroscience and Cell Biology, Rutgers-Robert Wood Johnson Medical School, Child Health Institute of New Jersey, New Brunswick, NJ
| | - Kyle P Flannery
- Department of Neuroscience and Cell Biology, Rutgers-Robert Wood Johnson Medical School, Child Health Institute of New Jersey, New Brunswick, NJ
| | - Zain Dardas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Jawid M Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Chaya Murali
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Varun Kannan
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX
| | - Timothy E Lotze
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX
| | - Isabella Herman
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX; Texas Children's Hospital, Houston, TX; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Boys Town National Research Hospital, Boys Town, NE
| | - Farah Ammouri
- Boys Town National Research Hospital, Boys Town, NE; The University of Kansas Health System, Westwood, KS
| | - Brianna Rezich
- Munroe-Meyer Institute for Genetics and Rehabilitation, University of Nebraska Medical Center, Omaha, NE
| | - Stephanie Efthymiou
- Department of Neuromuscular Diseases, UCL Institute of Neurology, London, United Kingdom
| | - Shahryar Alavi
- Department of Neuromuscular Diseases, UCL Institute of Neurology, London, United Kingdom
| | - David Murphy
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, United Kingdom
| | | | - Mahya Ebrahimi Nasab
- Meybod Genetic Research Center, Yazd, Iran; Yazd Welfare Organization, Yazd, Iran
| | - Amir Bahreini
- KaryoGen, Isfahan, Iran; Department of Human Genetics, University of Pittsburgh, PA
| | - Majid Ghasemi
- Department of Neurology, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Nourelhoda A Haridy
- Department of Neurology, Faculty of Medicine, Assiut University, Assiut, Egypt
| | - Hamid Reza Goldouzi
- Department of Pediatrics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Fatemeh Eghbal
- Department of Medical Genetics, Next Generation Genetic Polyclinic, Mashhad, Iran
| | - Ehsan Ghayoor Karimiani
- Molecular and Clinical Sciences Institute, St. George's, University of London, London, United Kingdom
| | | | | | | | - Vykuntaraju K Gowda
- Department of Pediatric Neurology, Indira Gandhi Institute of Child Health, Bangalore, India
| | - Haowei Du
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | | | - Zeynep Coban-Akdemir
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
| | - Dana Marafi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Department of Pediatrics, Faculty of Medicine, Kuwait University, Kuwait
| | - Lance Rodan
- Department of Neurology, Boston Children's Hospital, Boston, MA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Sedat Isikay
- Gaziantep Islam Science and Technology University, Medical Faculty, Department of Pediatric Neurology, Gaziantep, Turkey
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Baylor Genetics Laboratories, Houston, TX
| | - Subhadra Ramanathan
- Division of Genetics, Department of Pediatrics, Loma Linda University School of Medicine, Loma Linda, CA
| | - Michael Staton
- Division of Genetics, Department of Pediatrics, Loma Linda University School of Medicine, Loma Linda, CA
| | - Kerby C Oberg
- Department of Pathology and Human Anatomy, Loma Linda University School of Medicine, Loma Linda, CA
| | - Robin D Clark
- Division of Genetics, Department of Pediatrics, Loma Linda University School of Medicine, Loma Linda, CA
| | - Catharina Wenman
- Rare & Inherited Disease Laboratory, NHS North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Sam Loughlin
- Rare & Inherited Disease Laboratory, NHS North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Ramy Saad
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Tazeen Ashraf
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Alison Male
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Shereen Tadros
- North East Thames Regional Genetic Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom; Genetics and Genomic Medicine Department, University College London, United Kingdom
| | - Reza Boostani
- Department of Neurology, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Ghada M H Abdel-Salam
- Department of Clinical Genetics, Human Genetics and Genome Research Division, National Research Centre, Cairo, Egypt
| | - Maha Zaki
- Department of Clinical Genetics, Human Genetics and Genome Research Division, National Research Centre, Cairo, Egypt
| | - Ali Mardi
- Center for Comprehensive Genetic Services, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzad Hashemi-Gorji
- Genomic Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ebtesam Abdalla
- Department of Human Genetics, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - M Chiara Manzini
- Department of Neuroscience and Cell Biology, Rutgers-Robert Wood Johnson Medical School, Child Health Institute of New Jersey, New Brunswick, NJ
| | - Davut Pehlivan
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX; Texas Children's Hospital, Houston, TX; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
| | - Henry Houlden
- Department of Neuromuscular Diseases, UCL Institute of Neurology, London, United Kingdom
| | - Fowzan S Alkuraya
- Department of Translational Genomics, Center for Genomic Medicine, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia; Department of Pediatrics, Prince Sultan Military Medical City, Riyadh, Saudi Arabia
| | - Kinga Bujakowska
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Reza Maroofian
- Department of Neuromuscular Diseases, UCL Institute of Neurology, London, United Kingdom
| | - James R Lupski
- Texas Children's Hospital, Houston, TX; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX; Department of Pediatrics, Baylor College of Medicine, Houston, TX.
| | - Long N Nguyen
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Immunology Program, Life Sciences Institute, National University of Singapore, Singapore; Singapore Lipidomics Incubator (SLING), Life Sciences Institute, National University of Singapore, Singapore; Cardiovascular Disease Research (CVD) Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Immunology Translational Research Program, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| |
Collapse
|
35
|
Frenkel A, Frenkel M, Schulte JJ, Srinivasan S, Lamers L. Polyvalvular Dysplasia and Vascular Abnormalities in a Neonate With an FLNA Variant. JACC Case Rep 2024; 29:102556. [PMID: 39359981 PMCID: PMC11442230 DOI: 10.1016/j.jaccas.2024.102556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 07/19/2024] [Accepted: 07/26/2024] [Indexed: 10/04/2024]
Abstract
There is growing appreciation for inherited structural heart diseases and their genetic causes. One causal gene for congenital cardiac and vascular lesions is FLNA which encodes a critical protein for cytoskeletal and extracellular matrix development. A newborn infant male, with prenatally diagnosed polyvalvular dysfunction, presented with low cardiac output and postnatally detected aortic arch hypoplasia and coarctation. Attempted palliative coarctation intervention resulted in vascular complications that ultimately contributed to his demise. This case report highlights polyvalvular dysplasia, vascular abnormalities, and a likely causal de novo missense variant in the FLNA gene (c.5180 C>T p.P1727L) not previously described.
Collapse
Affiliation(s)
- Amy Frenkel
- School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin, USA
| | - Max Frenkel
- School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin, USA
- Cellular and Molecular Biology Graduate Program, University of Wisconsin, Madison, Wisconsin, USA
- Medical Scientist Training Program, University of Wisconsin, Madison, Wisconsin, USA
| | - Jefree J Schulte
- Department of Pathology and Laboratory Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin, USA
| | - Shardha Srinivasan
- Division of Cardiology, Department of Pediatrics, School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin, USA
| | - Luke Lamers
- Division of Cardiology, Department of Pediatrics, School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin, USA
| |
Collapse
|
36
|
Zeng W, Dou Y, Pan L, Xu L, Peng S. Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein. Nat Commun 2024; 15:7838. [PMID: 39244557 PMCID: PMC11380688 DOI: 10.1038/s41467-024-52293-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 08/29/2024] [Indexed: 09/09/2024] Open
Abstract
DNA-protein interactions exert the fundamental structure of many pivotal biological processes, such as DNA replication, transcription, and gene regulation. However, accurate and efficient computational methods for identifying these interactions are still lacking. In this study, we propose a method ESM-DBP through refining the DNA-binding protein sequence repertory and domain-adaptive pretraining based the general protein language model. Our method considers the lacking exploration of general language model for DNA-binding protein domain-specific knowledge, so we screen out 170,264 DNA-binding protein sequences to construct the domain-adaptive language model. Experimental results on four downstream tasks show that ESM-DBP provides a better feature representation of DNA-binding protein compared to the original language model, resulting in improved prediction performance and outperforming the state-of-the-art methods. Moreover, ESM-DBP can still perform well even for those sequences with only a few homologous sequences. ChIP-seq on two predicted cases further support the validity of the proposed method.
Collapse
Affiliation(s)
- Wenwu Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Yutao Dou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Liangrui Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Liwen Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| |
Collapse
|
37
|
Kim HS, Haley OC, Portwood Ii JL, Harding S, Proctor RH, Woodhouse MR, Sen TZ, Andorf CM. Fusarium Protein Toolkit: a web-based resource for structural and variant analysis of Fusarium species. BMC Microbiol 2024; 24:326. [PMID: 39243017 PMCID: PMC11378500 DOI: 10.1186/s12866-024-03480-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 08/27/2024] [Indexed: 09/09/2024] Open
Abstract
BACKGROUND The genus Fusarium poses significant threats to food security and safety worldwide because numerous species of the fungus cause destructive diseases and/or mycotoxin contamination in crops. The adverse effects of climate change are exacerbating some existing threats and causing new problems. These challenges highlight the need for innovative solutions, including the development of advanced tools to identify targets for control strategies. DESCRIPTION In response to these challenges, we developed the Fusarium Protein Toolkit (FPT), a web-based tool that allows users to interrogate the structural and variant landscape within the Fusarium pan-genome. The tool displays both AlphaFold and ESMFold-generated protein structure models from six Fusarium species. The structures are accessible through a user-friendly web portal and facilitate comparative analysis, functional annotation inference, and identification of related protein structures. Using a protein language model, FPT predicts the impact of over 270 million coding variants in two of the most agriculturally important species, Fusarium graminearum and F. verticillioides. To facilitate the assessment of naturally occurring genetic variation, FPT provides variant effect scores for proteins in a Fusarium pan-genome based on 22 diverse species. The scores indicate potential functional consequences of amino acid substitutions and are displayed as intuitive heatmaps using the PanEffect framework. CONCLUSION FPT fills a knowledge gap by providing previously unavailable tools to assess structural and missense variation in proteins produced by Fusarium. FPT has the potential to deepen our understanding of pathogenic mechanisms in Fusarium, and aid the identification of genetic targets for control strategies that reduce crop diseases and mycotoxin contamination. Such targets are vital to solving the agricultural problems incited by Fusarium, particularly evolving threats resulting from climate change. Thus, FPT has the potential to contribute to improving food security and safety worldwide.
Collapse
Grants
- 5010-11420-001-000-D and 5010-42000-053-000-D USDA, Agricultural Research Service, United States
- 0201-88888-003-000D and 0201-88888-002-000D USDA, Agricultural Research Service, United States
- 5030-21000-072-00-D USDA, Agricultural Research Service, United States
- 5010-11420-001-000-D and 5010-42000-053-000-D USDA, Agricultural Research Service, United States
- 5010-11420-001-000-D and 5010-42000-053-000-D USDA, Agricultural Research Service, United States
- 5030-21000-072-00-D USDA, Agricultural Research Service, United States
- 2030-21000-056-000-D USDA, Agricultural Research Service, United States
- 5030-21000-072-00-D USDA, Agricultural Research Service, United States
Collapse
Affiliation(s)
- Hye-Seon Kim
- USDA, Agricultural Research Service, National Center for Agricultural Utilization Research, Mycotoxin Prevention and Applied Microbiology Research Unit, 1815 N University St, Peoria, IL, 61604, USA
| | - Olivia C Haley
- USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, 819 Wallace Rd. Ames, IA, 50011, USA
| | - John L Portwood Ii
- USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, 819 Wallace Rd. Ames, IA, 50011, USA
| | - Stephen Harding
- USDA, Agricultural Research Service, National Center for Agricultural Utilization Research, Mycotoxin Prevention and Applied Microbiology Research Unit, 1815 N University St, Peoria, IL, 61604, USA
| | - Robert H Proctor
- USDA, Agricultural Research Service, National Center for Agricultural Utilization Research, Mycotoxin Prevention and Applied Microbiology Research Unit, 1815 N University St, Peoria, IL, 61604, USA
| | - Margaret R Woodhouse
- USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, 819 Wallace Rd. Ames, IA, 50011, USA
| | - Taner Z Sen
- USDA, Agricultural Research Service, Crop Improvement and Genetics Research Unit, 800 Buchanan St. Albany, CA, 94710, USA
- Department of Bioengineering, University of California, 306 Stanley Hall, Berkeley, CA, 94720, USA
| | - Carson M Andorf
- USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, 819 Wallace Rd. Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, 2434 Osborn Dr, Ames,, IA, 50011, USA.
| |
Collapse
|
38
|
Zhai J, Gokaslan A, Schiff Y, Berthel A, Liu ZY, Lai WY, Miller ZR, Scheben A, Stitzer MC, Romay MC, Buckler ES, Kuleshov V. Cross-species modeling of plant genomes at single nucleotide resolution using a pre-trained DNA language model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.04.596709. [PMID: 38895432 PMCID: PMC11185591 DOI: 10.1101/2024.06.04.596709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Interpreting function and fitness effects in diverse plant genomes requires transferable models. Language models (LMs) pre-trained on large-scale biological sequences can learn evolutionary conservation and offer cross-species prediction better than supervised models through fine-tuning limited labeled data. We introduce PlantCaduceus, a plant DNA LM based on the Caduceus and Mamba architectures, pre-trained on a curated dataset of 16 Angiosperm genomes. Fine-tuning PlantCaduceus on limited labeled Arabidopsis data for four tasks, including predicting translation initiation/termination sites and splice donor and acceptor sites, demonstrated high transferability to 160 million year diverged maize, outperforming the best existing DNA LM by 1.45 to 7.23-fold. PlantCaduceus is competitive to state-of-the-art protein LMs in terms of deleterious mutation identification, and is threefold better than PhyloP. Additionally, PlantCaduceus successfully identifies well-known causal variants in both Arabidopsis and maize. Overall, PlantCaduceus is a versatile DNA LM that can accelerate plant genomics and crop breeding applications.
Collapse
Affiliation(s)
- Jingjing Zhai
- Institute for Genomic Diversity, Cornell University, Ithaca, NY USA 14853
| | - Aaron Gokaslan
- Department of Computer Science, Cornell University, Ithaca, NY, USA 14853
| | - Yair Schiff
- Department of Computer Science, Cornell University, Ithaca, NY, USA 14853
| | - Ana Berthel
- Institute for Genomic Diversity, Cornell University, Ithaca, NY USA 14853
| | - Zong-Yan Liu
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY USA 14853
| | - Wei-Yun Lai
- Institute for Genomic Diversity, Cornell University, Ithaca, NY USA 14853
| | - Zachary R. Miller
- Institute for Genomic Diversity, Cornell University, Ithaca, NY USA 14853
| | - Armin Scheben
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY USA 11724
| | | | - M. Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY USA 14853
| | - Edward S. Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY USA 14853
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY USA 14853
- USDA-ARS; Ithaca, NY, USA 14853
| | - Volodymyr Kuleshov
- Department of Computer Science, Cornell University, Ithaca, NY, USA 14853
| |
Collapse
|
39
|
Correa Marrero M, Jänes J, Baptista D, Beltrao P. Integrating Large-Scale Protein Structure Prediction into Human Genetics Research. Annu Rev Genomics Hum Genet 2024; 25:123-140. [PMID: 38621234 DOI: 10.1146/annurev-genom-120622-020615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Collapse
Affiliation(s)
- Miguel Correa Marrero
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | - Jürgen Jänes
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | | | - Pedro Beltrao
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| |
Collapse
|
40
|
Sudnawa KK, Li W, Calamia S, Kanner CH, Bain JM, Abdelhakim AH, Geltzeiler A, Mebane CM, Provenzano FA, Sands TT, Fee RJ, Montes J, Shen Y, Chung WK. Heterogeneity of comprehensive clinical phenotype and longitudinal adaptive function and correlation with computational predictions of severity of missense genotypes in KIF1A-associated neurological disorder. Genet Med 2024; 26:101169. [PMID: 38785164 PMCID: PMC11298291 DOI: 10.1016/j.gim.2024.101169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
PURPOSE Pathogenic variants in kinesin family member 1A (KIF1A) are associated with KIF1A-associated neurological disorder. We report the clinical phenotypes and correlate genotypes of individuals with KIF1A-associated neurological disorder. METHODS Medical history and adaptive function were assessed longitudinally. In-person evaluations included neurological, motor, ophthalmologic, and cognitive assessments. RESULTS We collected online data on 177 individuals. Fifty-seven individuals were also assessed in-person. Most individuals had de novo heterozygous missense likely pathogenic/pathogenic KIF1A variants. The most common characteristics were hypotonia, spasticity, ataxia, seizures, optic nerve atrophy, cerebellar atrophy, and cognitive impairment. Mean Vineland adaptive behavior composite score (VABS-ABC) was low (M = 62.9, SD = 19.1). The mean change in VABS-ABC over time was -3.1 (SD = 7.3). The decline in VABS-ABC was associated with the age at first assessment and abnormal electroencephalogram/seizure. There was a positive correlation between evolutionary scale model (ESM) score for the variants and final VABS-ABC (P = .003). Abnormal electroencephalogram/seizure, neuroimaging result, and ESM explain 34% of the variance in final VABS-ABC (P < .001). CONCLUSION In-person assessment confirmed caregiver report and identified additional visual deficits. Adaptive function declined over time consistent with both the neurodevelopmental and neurodegenerative nature of the condition. Using ESM score assists in predicting phenotype across a wide range of unique variants.
Collapse
Affiliation(s)
- Khemika K Sudnawa
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA; Department of Pediatrics, Phramongkutklao Hospital and Phramongkutklao College of Medicine, Bangkok, Thailand
| | - Wenxing Li
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY
| | - Sean Calamia
- Department of Pediatrics, Columbia University, New York, NY
| | - Cara H Kanner
- Department of Rehabilitation and Regenerative Medicine, Columbia University Irving Medical Center, New York, NY
| | - Jennifer M Bain
- Departments of Neurology and Pediatrics, Columbia University Irving Medical Center, New York, NY
| | - Aliaa H Abdelhakim
- Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, NY
| | - Alexa Geltzeiler
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA
| | | | - Frank A Provenzano
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Department of Neurology, Columbia University Medical Center, New York, NY
| | - Tristan T Sands
- Departments of Neurology and Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Robert J Fee
- Department of Neurology, Columbia University Vagelos College of Physicians and Surgeons and New York-Presbyterian Hospital, New York, NY
| | - Jacqueline Montes
- Department of Rehabilitation and Regenerative Medicine, Columbia University Irving Medical Center, New York, NY
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY; Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY
| | - Wendy K Chung
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA.
| |
Collapse
|
41
|
Simon E, Swanson K, Zou J. Language models for biological research: a primer. Nat Methods 2024; 21:1422-1429. [PMID: 39122951 DOI: 10.1038/s41592-024-02354-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 06/18/2024] [Indexed: 08/12/2024]
Abstract
Language models are playing an increasingly important role in many areas of artificial intelligence (AI) and computational biology. In this primer, we discuss the ways in which language models, both those based on natural language and those based on biological sequences, can be applied to biological research. This primer is primarily intended for biologists interested in using these cutting-edge AI technologies in their applications. We provide guidance on best practices and key resources for adapting language models for biology.
Collapse
Affiliation(s)
- Elana Simon
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Kyle Swanson
- Department of Computer Science, Stanford University, Stanford, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, USA.
- Department of Computer Science, Stanford University, Stanford, USA.
- Chan-Zuckerberg Biohub, San Francisco, USA.
| |
Collapse
|
42
|
Kovacs AS, Portelli S, Silk M, Rodrigues CHM, Ascher DB. MTR3D-AF2: Expanding the coverage of spatially derived missense tolerance scores across the human proteome using AlphaFold2. Protein Sci 2024; 33:e5112. [PMID: 39031445 PMCID: PMC11258768 DOI: 10.1002/pro.5112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 06/24/2024] [Accepted: 06/26/2024] [Indexed: 07/22/2024]
Abstract
The missense tolerance ratio (MTR) was developed as a novel approach to assess the deleteriousness of variants. Its three-dimensional successor, MTR3D, was demonstrated powerful at discriminating pathogenic from benign variants. However, its reliance on experimental structures and homologs limited its coverage of the proteome. We have now utilized AlphaFold2 models to develop MTR3D-AF2, which covers 89.31% of proteins and 85.39% of residues across the human proteome. This work has improved MTR3D's ability to distinguish clinically established pathogenic from benign variants. MTR3D-AF2 is freely available as an interactive web server at https://biosig.lab.uq.edu.au/mtr3daf2/.
Collapse
Affiliation(s)
- Aaron S. Kovacs
- The Australian Center for Ecogenomics, School of Chemistry and Molecular BiosciencesThe University of QueenslandBrisbaneQueenslandAustralia
| | - Stephanie Portelli
- The Australian Center for Ecogenomics, School of Chemistry and Molecular BiosciencesThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Michael Silk
- Centre for Population Genomics, Murdoch Children's Research InstituteMelbourneAustralia
- Systems and Computational BiologyBio21 Institute, The University of MelbourneMelbourneAustralia
| | - Carlos H. M. Rodrigues
- The Australian Center for Ecogenomics, School of Chemistry and Molecular BiosciencesThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - David B. Ascher
- The Australian Center for Ecogenomics, School of Chemistry and Molecular BiosciencesThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
- Systems and Computational BiologyBio21 Institute, The University of MelbourneMelbourneAustralia
| |
Collapse
|
43
|
Gordon MG, Kathail P, Choy B, Kim MC, Mazumder T, Gearing M, Ye CJ. Population Diversity at the Single-Cell Level. Annu Rev Genomics Hum Genet 2024; 25:27-49. [PMID: 38382493 DOI: 10.1146/annurev-genom-021623-083207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Population-scale single-cell genomics is a transformative approach for unraveling the intricate links between genetic and cellular variation. This approach is facilitated by cutting-edge experimental methodologies, including the development of high-throughput single-cell multiomics and advances in multiplexed environmental and genetic perturbations. Examining the effects of natural or synthetic genetic variants across cellular contexts provides insights into the mutual influence of genetics and the environment in shaping cellular heterogeneity. The development of computational methodologies further enables detailed quantitative analysis of molecular variation, offering an opportunity to examine the respective roles of stochastic, intercellular, and interindividual variation. Future opportunities lie in leveraging long-read sequencing, refining disease-relevant cellular models, and embracing predictive and generative machine learning models. These advancements hold the potential for a deeper understanding of the genetic architecture of human molecular traits, which in turn has important implications for understanding the genetic causes of human disease.
Collapse
Affiliation(s)
| | - Pooja Kathail
- Center for Computational Biology, University of California, Berkeley, California, USA
| | - Bryson Choy
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Min Cheol Kim
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Thomas Mazumder
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Melissa Gearing
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Chun Jimmie Ye
- Arc Institute, Palo Alto, California, USA
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
- Bakar Computational Health Sciences Institute, Gladstone-UCSF Institute of Genomic Immunology, Parker Institute for Cancer Immunotherapy, Department of Epidemiology and Biostatistics, Department of Microbiology and Immunology, and Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA;
| |
Collapse
|
44
|
Khan RT, Pokorna P, Stourac J, Borko S, Arefiev I, Planas-Iglesias J, Dobias A, Pinto G, Szotkowska V, Sterba J, Slaby O, Damborsky J, Mazurenko S, Bednar D. A computational workflow for analysis of missense mutations in precision oncology. J Cheminform 2024; 16:86. [PMID: 39075588 PMCID: PMC11285293 DOI: 10.1186/s13321-024-00876-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 06/26/2024] [Indexed: 07/31/2024] Open
Abstract
Every year, more than 19 million cancer cases are diagnosed, and this number continues to increase annually. Since standard treatment options have varying success rates for different types of cancer, understanding the biology of an individual's tumour becomes crucial, especially for cases that are difficult to treat. Personalised high-throughput profiling, using next-generation sequencing, allows for a comprehensive examination of biopsy specimens. Furthermore, the widespread use of this technology has generated a wealth of information on cancer-specific gene alterations. However, there exists a significant gap between identified alterations and their proven impact on protein function. Here, we present a bioinformatics pipeline that enables fast analysis of a missense mutation's effect on stability and function in known oncogenic proteins. This pipeline is coupled with a predictor that summarises the outputs of different tools used throughout the pipeline, providing a single probability score, achieving a balanced accuracy above 86%. The pipeline incorporates a virtual screening method to suggest potential FDA/EMA-approved drugs to be considered for treatment. We showcase three case studies to demonstrate the timely utility of this pipeline. To facilitate access and analysis of cancer-related mutations, we have packaged the pipeline as a web server, which is freely available at https://loschmidt.chemi.muni.cz/predictonco/ .Scientific contributionThis work presents a novel bioinformatics pipeline that integrates multiple computational tools to predict the effects of missense mutations on proteins of oncological interest. The pipeline uniquely combines fast protein modelling, stability prediction, and evolutionary analysis with virtual drug screening, while offering actionable insights for precision oncology. This comprehensive approach surpasses existing tools by automating the interpretation of mutations and suggesting potential treatments, thereby striving to bridge the gap between sequencing data and clinical application.
Collapse
Affiliation(s)
- Rayyan Tariq Khan
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Petra Pokorna
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Simeon Borko
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
- IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Ihor Arefiev
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Adam Dobias
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Gaspar Pinto
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Veronika Szotkowska
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Jaroslav Sterba
- Department of Paediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Ondrej Slaby
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic.
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic.
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic.
- Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic.
| |
Collapse
|
45
|
Utsumi T, Tsumura M, Yashiro M, Kato Z, Noma K, Sakura F, Kagawa R, Mizoguchi Y, Karakawa S, Ohnishi H, Cunningham-Rundles C, Arkwright PD, Kobayashi M, Kanegane H, Bogunovic D, Boisson B, Casanova JL, Asano T, Okada S. Exclusive Characteristics of the p.E555K Dominant-Negative Variant in Autosomal Dominant E47 Deficiency. J Clin Immunol 2024; 44:167. [PMID: 39073655 PMCID: PMC11286708 DOI: 10.1007/s10875-024-01758-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 06/21/2024] [Indexed: 07/30/2024]
Abstract
PURPOSE Transcription factor 3 (TCF3) encodes 2 transcription factors generated by alternative splicing, E12 and E47, which contribute to early lymphocyte differentiation. In humans, autosomal dominant (AD) E47 transcription factor deficiency is an inborn error of immunity characterized by B-cell deficiency and agammaglobulinemia. Only the recurrent de novo p.E555K pathogenic variant has been associated with this disease and acts via a dominant-negative (DN) mechanism. In this study, we describe the first Asian patient with agammaglobulinemia caused by the TCF3 p.E555K variant and provide insights into the structure and function of this variant. METHODS TCF3 variant was identified by inborn errors of immunity-related gene panel sequencing. The variant E555K was characterized by alanine scanning of the E47 basic region and comprehensive mutational analysis focused on position 555. RESULTS The patient was a 25-year-old male with B-cell deficiency, agammaglobulinemia, and mild facial dysmorphic features. We confirmed the diagnosis of AD E47 transcription factor deficiency by identifying a heterozygous missense variant, c.1663 G>A; p.E555K, in TCF3. Alanine scanning of the E47 basic region revealed the structural importance of position 555. Comprehensive mutational analysis focused on position 555 showed that only the glutamate-to-lysine substitution had a strong DN effect. 3D modeling demonstrated that this variant not only abolished hydrogen bonds involved in protein‒DNA interactions, but also inverted the charge on the surface of the E47 protein. CONCLUSIONS Our study reveals the causative mutation hotspot in the TCF3 DN variant and highlights the weak negative selection associated with the TCF3 gene.
Collapse
Affiliation(s)
- Takanori Utsumi
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Miyuki Tsumura
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Masato Yashiro
- Department of Pediatrics, Okayama University Hospital, Okayama, Japan
| | - Zenichiro Kato
- Department of Pediatrics, Graduate School of Medicine, Gifu University, Gifu, Japan
- Structural Medicine, United Graduate School of Drug Discovery and Medical Information Science, Gifu University, Gifu, Japan
| | - Kosuke Noma
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Fumiaki Sakura
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- Department of Applied Genomics, Kazusa DNA Research Institute, Chiba, Japan
| | - Reiko Kagawa
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Yoko Mizoguchi
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Shuhei Karakawa
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Hidenori Ohnishi
- Department of Pediatrics, Graduate School of Medicine, Gifu University, Gifu, Japan
| | - Charlotte Cunningham-Rundles
- Division of Allergy and Clinical Immunology, Departments of Medicine and Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Peter D Arkwright
- Lydia Becker Institute of Immunology and Inflammation, University of Manchester, Manchester, UK
| | - Masao Kobayashi
- Japanese Red Cross Chugoku-Shikoku Block Blood Center, Hiroshima, Japan
| | - Hirokazu Kanegane
- Department of Child Health and Development, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University (TMDU), Tokyo, Japan
| | - Dusan Bogunovic
- Center for Inborn Errors of Immunity, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bertrand Boisson
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY, USA
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, Paris, France
- Paris Descartes University, Imagine Institute, Paris, France
| | - Jean-Laurent Casanova
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY, USA
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, Paris, France
- Paris Descartes University, Imagine Institute, Paris, France
- Pediatric Hematology-Immunology Unit, Necker Hospital for Sick Children, AP-HP, Paris, France
- Howard Hughes Medical Institute (HHMI), New York, NY, USA
| | - Takaki Asano
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan.
- Department of Genetics and Cell Biology, Research Institute for Radiation Biology and Medicine, Hiroshima University, Hiroshima, Japan.
| | - Satoshi Okada
- Department of Pediatrics, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan.
| |
Collapse
|
46
|
Vornholt T, Mutný M, Schmidt GW, Schellhaas C, Tachibana R, Panke S, Ward TR, Krause A, Jeschek M. Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning. ACS CENTRAL SCIENCE 2024; 10:1357-1370. [PMID: 39071060 PMCID: PMC11273458 DOI: 10.1021/acscentsci.4c00258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/22/2024] [Accepted: 05/02/2024] [Indexed: 07/30/2024]
Abstract
Tailored enzymes are crucial for the transition to a sustainable bioeconomy. However, enzyme engineering is laborious and failure-prone due to its reliance on serendipity. The efficiency and success rates of engineering campaigns may be improved by applying machine learning to map the sequence-activity landscape based on small experimental data sets. Yet, it often proves challenging to reliably model large sequence spaces while keeping the experimental effort tractable. To address this challenge, we present an integrated pipeline combining large-scale screening with active machine learning, which we applied to engineer an artificial metalloenzyme (ArM) catalyzing a new-to-nature hydroamination reaction. Combining lab automation and next-generation sequencing, we acquired sequence-activity data for several thousand ArM variants. We then used Gaussian process regression to model the activity landscape and guide further screening rounds. Critical characteristics of our pipeline include the cost-effective generation of information-rich data sets, the integration of an explorative round to improve the model's performance, and the inclusion of experimental noise. Our approach led to an order-of-magnitude boost in the hit rate while making efficient use of experimental resources. Search strategies like this should find broad utility in enzyme engineering and accelerate the development of novel biocatalysts.
Collapse
Affiliation(s)
- Tobias Vornholt
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
- National
Centre of Competence in Research (NCCR) Molecular Systems Engineering, 4056 Basel,Switzerland
| | - Mojmír Mutný
- Department
of Computer Science, ETH Zurich, Andreasstrasse 5, 8092 Zurich, Switzerland
| | - Gregor W. Schmidt
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Christian Schellhaas
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Ryo Tachibana
- Department
of Chemistry, University of Basel, Mattenstrasse 24a, 4058 Basel, Switzerland
| | - Sven Panke
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
- National
Centre of Competence in Research (NCCR) Molecular Systems Engineering, 4056 Basel,Switzerland
| | - Thomas R. Ward
- National
Centre of Competence in Research (NCCR) Molecular Systems Engineering, 4056 Basel,Switzerland
- Department
of Chemistry, University of Basel, Mattenstrasse 24a, 4058 Basel, Switzerland
| | - Andreas Krause
- Department
of Computer Science, ETH Zurich, Andreasstrasse 5, 8092 Zurich, Switzerland
| | - Markus Jeschek
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
- Institute
of Microbiology, University of Regensburg, Universitätsstraße 31, 93053 Regensburg, Germany
| |
Collapse
|
47
|
Ozkan S, Padilla N, de la Cruz X. QAFI: a novel method for quantitative estimation of missense variant impact using protein-specific predictors and ensemble learning. Hum Genet 2024:10.1007/s00439-024-02692-z. [PMID: 39048855 DOI: 10.1007/s00439-024-02692-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 07/14/2024] [Indexed: 07/27/2024]
Abstract
Next-generation sequencing (NGS) has revolutionized genetic diagnostics, yet its application in precision medicine remains incomplete, despite significant advances in computational tools for variant annotation. Many variants remain unannotated, and existing tools often fail to accurately predict the range of impacts that variants have on protein function. This limitation restricts their utility in relevant applications such as predicting disease severity and onset age. In response to these challenges, a new generation of computational models is emerging, aimed at producing quantitative predictions of genetic variant impacts. However, the field is still in its early stages, and several issues need to be addressed, including improved performance and better interpretability. This study introduces QAFI, a novel methodology that integrates protein-specific regression models within an ensemble learning framework, utilizing conservation-based and structure-related features derived from AlphaFold models. Our findings indicate that QAFI significantly enhances the accuracy of quantitative predictions across various proteins. The approach has been rigorously validated through its application in the CAGI6 contest, focusing on ARSA protein variants, and further tested on a comprehensive set of clinically labeled variants, demonstrating its generalizability and robust predictive power. The straightforward nature of our models may also contribute to better interpretability of the results.
Collapse
Affiliation(s)
- Selen Ozkan
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Natàlia Padilla
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Xavier de la Cruz
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
48
|
Qahwaji R, Ashankyty I, Sannan NS, Hazzazi MS, Basabrain AA, Mobashir M. Pharmacogenomics: A Genetic Approach to Drug Development and Therapy. Pharmaceuticals (Basel) 2024; 17:940. [PMID: 39065790 PMCID: PMC11279827 DOI: 10.3390/ph17070940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/03/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024] Open
Abstract
The majority of the well-known pharmacogenomics research used in the medical sciences contributes to our understanding of medication interactions. It has a significant impact on treatment and drug development. The broad use of pharmacogenomics is required for the progress of therapy. The main focus is on how genes and an intricate gene system affect the body's reaction to medications. Novel biomarkers that help identify a patient group that is more or less likely to respond to a certain medication have been discovered as a result of recent developments in the field of clinical therapeutics. It aims to improve customized therapy by giving the appropriate drug at the right dose at the right time and making sure that the right prescriptions are issued. A combination of genetic, environmental, and patient variables that impact the pharmacokinetics and/or pharmacodynamics of medications results in interindividual variance in drug response. Drug development, illness susceptibility, and treatment efficacy are all impacted by pharmacogenomics. The purpose of this work is to give a review that might serve as a foundation for the creation of new pharmacogenomics applications, techniques, or strategies.
Collapse
Affiliation(s)
- Rowaid Qahwaji
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 22254, Saudi Arabia; (R.Q.); (I.A.); (M.S.H.); (A.A.B.)
- Hematology Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Ibraheem Ashankyty
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 22254, Saudi Arabia; (R.Q.); (I.A.); (M.S.H.); (A.A.B.)
| | - Naif S. Sannan
- College of Applied Medical Sciences, King Saud bin Abdulaziz University for Health Sciences, Ar Rimayah, Riyadh 14611, Saudi Arabia;
- King Abdullah International Medical Research Center, Jeddah 22384, Saudi Arabia
| | - Mohannad S. Hazzazi
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 22254, Saudi Arabia; (R.Q.); (I.A.); (M.S.H.); (A.A.B.)
- Hematology Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Ammar A. Basabrain
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 22254, Saudi Arabia; (R.Q.); (I.A.); (M.S.H.); (A.A.B.)
- Hematology Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Mohammad Mobashir
- Department of Biomedical Laboratory Science, Faculty of Natural Sciences, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| |
Collapse
|
49
|
Zhang X, Theotokis PI, Li N, Wright CF, Samocha KE, Whiffin N, Ware JS. Genetic constraint at single amino acid resolution in protein domains improves missense variant prioritisation and gene discovery. Genome Med 2024; 16:88. [PMID: 38992748 PMCID: PMC11238507 DOI: 10.1186/s13073-024-01358-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 06/26/2024] [Indexed: 07/13/2024] Open
Abstract
BACKGROUND One of the major hurdles in clinical genetics is interpreting the clinical consequences associated with germline missense variants in humans. Recent significant advances have leveraged natural variation observed in large-scale human populations to uncover genes or genomic regions that show a depletion of natural variation, indicative of selection pressure. We refer to this as "genetic constraint". Although existing genetic constraint metrics have been demonstrated to be successful in prioritising genes or genomic regions associated with diseases, their spatial resolution is limited in distinguishing pathogenic variants from benign variants within genes. METHODS We aim to identify missense variants that are significantly depleted in the general human population. Given the size of currently available human populations with exome or genome sequencing data, it is not possible to directly detect depletion of individual missense variants, since the average expected number of observations of a variant at most positions is less than one. We instead focus on protein domains, grouping homologous variants with similar functional impacts to examine the depletion of natural variations within these comparable sets. To accomplish this, we develop the Homologous Missense Constraint (HMC) score. We utilise the Genome Aggregation Database (gnomAD) 125 K exome sequencing data and evaluate genetic constraint at quasi amino-acid resolution by combining signals across protein homologues. RESULTS We identify one million possible missense variants under strong negative selection within protein domains. Though our approach annotates only protein domains, it nonetheless allows us to assess 22% of the exome confidently. It precisely distinguishes pathogenic variants from benign variants for both early-onset and adult-onset disorders. It outperforms existing constraint metrics and pathogenicity meta-predictors in prioritising de novo mutations from probands with developmental disorders (DD). It is also methodologically independent of these, adding power to predict variant pathogenicity when used in combination. We demonstrate utility for gene discovery by identifying seven genes newly significantly associated with DD that could act through an altered-function mechanism. CONCLUSIONS Grouping variants of comparable functional impacts is effective in evaluating their genetic constraint. HMC is a novel and accurate predictor of missense consequence for improved variant interpretation.
Collapse
Affiliation(s)
- Xiaolei Zhang
- National Heart & Lung Institute, Imperial College London, London, UK.
- MRC Laboratory of Medical Sciences, Imperial College London, London, UK.
- Royal Brompton & Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK.
- Present address: European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.
| | - Pantazis I Theotokis
- National Heart & Lung Institute, Imperial College London, London, UK
- MRC Laboratory of Medical Sciences, Imperial College London, London, UK
- Royal Brompton & Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | - Nicholas Li
- National Heart & Lung Institute, Imperial College London, London, UK
- MRC Laboratory of Medical Sciences, Imperial College London, London, UK
- Royal Brompton & Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | - Caroline F Wright
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, UK
| | - Kaitlin E Samocha
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicola Whiffin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Centre for Human Genetics, University of Oxford, Oxford, UK.
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
| | - James S Ware
- National Heart & Lung Institute, Imperial College London, London, UK.
- MRC Laboratory of Medical Sciences, Imperial College London, London, UK.
- Royal Brompton & Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
50
|
Zhang T, Tan S, Tang N, Li Y, Zhang C, Sun J, Guo Y, Gao H, Cai Y, Sun W, Wang C, Fu L, Ma H, Wu Y, Hu X, Zhang X, Gee P, Yan W, Zhao Y, Chen Q, Guo B, Wang H, Zhang YE. Heterologous survey of 130 DNA transposons in human cells highlights their functional divergence and expands the genome engineering toolbox. Cell 2024; 187:3741-3760.e30. [PMID: 38843831 DOI: 10.1016/j.cell.2024.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/11/2024] [Accepted: 05/02/2024] [Indexed: 07/14/2024]
Abstract
Experimental studies on DNA transposable elements (TEs) have been limited in scale, leading to a lack of understanding of the factors influencing transposition activity, evolutionary dynamics, and application potential as genome engineering tools. We predicted 130 active DNA TEs from 102 metazoan genomes and evaluated their activity in human cells. We identified 40 active (integration-competent) TEs, surpassing the cumulative number (20) of TEs found previously. With this unified comparative data, we found that the Tc1/mariner superfamily exhibits elevated activity, potentially explaining their pervasive horizontal transfers. Further functional characterization of TEs revealed additional divergence in features such as insertion bias. Remarkably, in CAR-T therapy for hematological and solid tumors, Mariner2_AG (MAG), the most active DNA TE identified, largely outperformed two widely used vectors, the lentiviral vector and the TE-based vector SB100X. Overall, this study highlights the varied transposition features and evolutionary dynamics of DNA TEs and increases the TE toolbox diversity.
Collapse
Affiliation(s)
- Tongtong Zhang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Shengjun Tan
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Na Tang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China; Beijing Institute for Stem Cell and Regenerative Medicine, Beijing 100101, China
| | - Yuanqing Li
- University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Chenze Zhang
- National Key Laboratory of Efficacy and Mechanism on Chinese Medicine for Metabolic Diseases, Beijing University of Chinese Medicine, Beijing 100029, China
| | - Jing Sun
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Yanyan Guo
- University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Hui Gao
- Rengene Biotechnology Co., Ltd., Beijing 100036, China
| | - Yujia Cai
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Wen Sun
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China; Beijing Institute for Stem Cell and Regenerative Medicine, Beijing 100101, China
| | - Chenxin Wang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China; Beijing Institute for Stem Cell and Regenerative Medicine, Beijing 100101, China
| | - Liangzheng Fu
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Huijing Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yachao Wu
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaoxuan Hu
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Xuechun Zhang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Peter Gee
- MaxCyte Inc., Rockville, MD 20850, USA
| | - Weihua Yan
- Cold Spring Biotech Corp., Beijing 100031, China
| | - Yahui Zhao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Qiang Chen
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Baocheng Guo
- University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; Academy of Plateau Science and Sustainability, Qinghai Normal University, Xining 810008, China
| | - Haoyi Wang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China; Beijing Institute for Stem Cell and Regenerative Medicine, Beijing 100101, China.
| | - Yong E Zhang
- University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|