1
|
You N, Liu C, Gu Y, Wang R, Jia H, Zhang T, Jiang S, Shi J, Chen M, Guan MX, Sun S, Pei S, Liu Z, Shen N. SpliceTransformer predicts tissue-specific splicing linked to human diseases. Nat Commun 2024; 15:9129. [PMID: 39443442 PMCID: PMC11500173 DOI: 10.1038/s41467-024-53088-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 09/24/2024] [Indexed: 10/25/2024] Open
Abstract
We present SpliceTransformer (SpTransformer), a deep-learning framework that predicts tissue-specific RNA splicing alterations linked to human diseases based on genomic sequence. SpTransformer outperforms all previous methods on splicing prediction. Application to approximately 1.3 million genetic variants in the ClinVar database reveals that splicing alterations account for 60% of intronic and synonymous pathogenic mutations, and occur at different frequencies across tissue types. Importantly, tissue-specific splicing alterations match their clinical manifestations independent of gene expression variation. We validate the enrichment in three brain disease datasets involving over 164,000 individuals. Additionally, we identify single nucleotide variations that cause brain-specific splicing alterations, and find disease-associated genes harboring these single nucleotide variations with distinct expression patterns involved in diverse biological processes. Finally, SpTransformer analysis of whole exon sequencing data from blood samples of patients with diabetic nephropathy predicts kidney-specific RNA splicing alterations with 83% accuracy, demonstrating the potential to infer disease-causing tissue-specific splicing events. SpTransformer provides a powerful tool to guide biological and clinical interpretations of human diseases.
Collapse
Affiliation(s)
- Ningyuan You
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Chang Liu
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Yuxin Gu
- Institute of Genetics, Zhejiang University School of Medicine, Hangzhou, China
| | - Rong Wang
- Department of Hematology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hanying Jia
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Tianyun Zhang
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Song Jiang
- National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China
| | - Jinsong Shi
- National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Min-Xin Guan
- Institute of Genetics, Zhejiang University School of Medicine, Hangzhou, China
| | - Siqi Sun
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
| | - Shanshan Pei
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- Bone Marrow Transplantation Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhihong Liu
- National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China.
| | - Ning Shen
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.
| |
Collapse
|
2
|
Ding P, Zhou Y, Yang M, Li S, Zhang S, Zhi L. Case Report: PROS1 (c.76+2_76+3del) pathogenic mutation causes pulmonary embolism. Front Cardiovasc Med 2024; 11:1459579. [PMID: 39465133 PMCID: PMC11502441 DOI: 10.3389/fcvm.2024.1459579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 09/23/2024] [Indexed: 10/29/2024] Open
Abstract
Background Genetic variation plays an extremely important pathogenic role in the development of venous thromboembolism (VTE). Genetic protein S (PS) deficiency caused by PROS1 gene mutation is an important risk factor for hereditary thrombophilia. Case introduction In this case, we report a 28-year-old male patient who developed a severe pulmonary embolism during his visit. The patient had experienced one month of chest pains, coughing and hemoptysis symptoms. CTPA confirmed an acute pulmonary embolism with multiple filling defects in both pulmonary arteries. Ultrasound showed no thrombosis in the veins of both lower limbs. The patient's father and grandfather have a history of lower limb venous thrombosis. The patient was diagnosed with acute pulmonary embolism and pneumonia. The serum PS level significantly decreased (detection result: 10%, normal range: 77-143). Gene sequencing revealed a heterozygous missense mutation in PROS1 c.76+2_76+3del (base deletion), and further testing revealed that the genetic variation originated from his father. The patient was treated with heparin anticoagulant therapy, catheter thrombus aspiration, and catheter thrombolysis. After treatment, the patient's chest pain symptoms were relieved, and there were no symptoms such as difficulty breathing. On the 7th day of admission, the patient was transferred to a general hospital for further treatment. Conclusion Hereditary thrombophilia caused by mutations in the PROS1 (c.76+2_76+3del) gene is extremely rare. In clinical practice, heparin and rivaroxaban treatment are beneficial.
Collapse
Affiliation(s)
- Peng Ding
- Department of Critical Care Medicine, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Yuan Zhou
- Department of Geriatric Medicine, The General Hospital of Western Theater Command of PLA, Chengdu, China
| | - Meijie Yang
- Department of Critical Care Medicine, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Sheng Li
- Department of Critical Care Medicine, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Song Zhang
- Department of Critical Care Medicine, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Lijia Zhi
- Department of Critical Care Medicine, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
| |
Collapse
|
3
|
D'Abrusco F, Serpieri V, Taccagni CM, Garau J, Cattaneo L, Boggioni M, Gana S, Battini R, Bertini E, Zanni G, Boltshauser E, Borgatti R, Romaniello R, Signorini S, Leuzzi V, Caputi C, Manti F, D'Arrigo S, De Laurentiis A, Graziano C, Lemke JR, Morelli F, Petković Ramadža D, Sirchia F, Giorgio E, Valente EM. Pathogenic cryptic variants detectable through exome data reanalysis significantly increase the diagnostic yield in Joubert syndrome. Eur J Hum Genet 2024:10.1038/s41431-024-01703-x. [PMID: 39394465 DOI: 10.1038/s41431-024-01703-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 09/02/2024] [Accepted: 09/25/2024] [Indexed: 10/13/2024] Open
Abstract
Joubert syndrome (JS) is a genetically heterogeneous neurodevelopmental ciliopathy. Despite exome sequencing (ES), several patients remain undiagnosed. This study aims to increase the diagnostic yield by uncovering cryptic variants through targeted ES reanalysis. We first focused on 26 patients in whom ES only disclosed heterozygous pathogenic coding variants in a JS gene. We reanalyzed raw ES data searching for copy number variants (CNVs) and intronic variants affecting splicing. We validated CNVs through real-time PCR or chromosomal microarray, and splicing variants through RT-PCR or minigenes. Cryptic variants were then searched in additional 44 ES-negative JS individuals. We identified cryptic "second hits" in 14 of 26 children (54%) and biallelic cryptic variants in 3 of 44 (7%), reaching a definite diagnosis in 17 of 70 (overall diagnostic gain 24%). We show that CNVs and intronic splicing variants are a common mutational mechanism in JS; more importantly, we demonstrate that a significant proportion of such variants can be disclosed simply through a focused reanalysis of available ES data, with a significantly increase of the diagnostic yield especially among patients previously found to carry heterozygous coding variants in the KIAA0586, CC2D2A and CPLANE1 genes.
Collapse
Affiliation(s)
- Fulvio D'Abrusco
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | | | | | - Jessica Garau
- Neurogenetics Research Centre, IRCCS Mondino Foundation, Pavia, Italy
| | - Luca Cattaneo
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | - Monica Boggioni
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | - Simone Gana
- Neurogenetics Research Centre, IRCCS Mondino Foundation, Pavia, Italy
| | - Roberta Battini
- IRCCS Stella Maris Foundation, Pisa, Italy
- Department of Clinical ad Experimental Medicine, University of Pisa, Pisa, Italy
| | - Enrico Bertini
- Research Unit of Neuromuscular and Neurodegenerative Disorders, IRCCS Bambino Gesù Pediatric Hospital, Rome, Italy
| | - Ginevra Zanni
- Research Unit of Neuromuscular and Neurodegenerative Disorders, IRCCS Bambino Gesù Pediatric Hospital, Rome, Italy
| | | | - Renato Borgatti
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
- Child Neurology and Psychiatry Unit, IRCCS Mondino Foundation, Pavia, Italy
| | - Romina Romaniello
- Child Neurology and Psychiatry Unit, IRCCS Mondino Foundation, Pavia, Italy
| | - Sabrina Signorini
- Child Neurology and Psychiatry Unit, IRCCS Mondino Foundation, Pavia, Italy
| | - Vincenzo Leuzzi
- Department of Human Neuroscience, Unit of Child Neurology and Psychiatry, Sapienza University of Rome, Rome, Italy
| | - Caterina Caputi
- Developmental Age Rehabilitation Service, Trasimeno District, Magione (PG), Italy
| | - Filippo Manti
- Department of Human Neuroscience, Unit of Child Neurology and Psychiatry, Sapienza University of Rome, Rome, Italy
| | - Stefano D'Arrigo
- Department of Developmental Neurology, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Arianna De Laurentiis
- Department of Developmental Neurology, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Claudio Graziano
- Medical Genetics Unit, MeLabeT Department, AUSL Romagna, Cesena, Italy
| | - Johannes R Lemke
- Institute of Human Genetics, University of Leipzig, Leipzig, Germany
| | - Federica Morelli
- Department of Psychiatry, Autism Spectrum Disorders and Related Conditions Service, Lausanne University Hospital (CHUV), Lausanne, Switzerland
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Danijela Petković Ramadža
- Department of Pediatrics, University Hospital Centre Zagreb and University of Zagreb School of Medicine, Zagreb, Croatia
| | - Fabio Sirchia
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
- Medical Genetics Unit, IRCCS San Matteo Foundation, Pavia, Italy
| | - Elisa Giorgio
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
- Neurogenetics Research Centre, IRCCS Mondino Foundation, Pavia, Italy
| | - Enza Maria Valente
- Department of Molecular Medicine, University of Pavia, Pavia, Italy.
- Neurogenetics Research Centre, IRCCS Mondino Foundation, Pavia, Italy.
| |
Collapse
|
4
|
Keefer-Jacques E, Valente N, Jacko AM, Matwijec G, Reese A, Tekriwal A, Loomes KM, Spinner NB, Gilbert MA. Investigation of cryptic JAG1 splice variants as a cause of Alagille syndrome and performance evaluation of splice predictor tools. HGG ADVANCES 2024; 5:100351. [PMID: 39244638 PMCID: PMC11440345 DOI: 10.1016/j.xhgg.2024.100351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 09/03/2024] [Accepted: 09/04/2024] [Indexed: 09/09/2024] Open
Abstract
Haploinsufficiency of JAG1 is the primary cause of Alagille syndrome (ALGS), a rare, multisystem disorder. The identification of JAG1 intronic variants outside of the canonical splice region as well as missense variants, both of which lead to uncertain associations with disease, confuses diagnostics. Strategies to determine whether these variants affect splicing include the study of patient RNA or minigene constructs, which are not always available or can be laborious to design, as well as the utilization of computational splice prediction tools. These tools, including SpliceAI and Pangolin, use algorithms to calculate the probability that a variant results in a splice alteration, expressed as a Δ score, with higher Δ scores (>0.2 on a 0-1 scale) positively correlated with aberrant splicing. We studied the consequence of 10 putative splice variants in ALGS patient samples through RNA analysis and compared this to SpliceAI and Pangolin predictions. We identified eight variants with aberrant splicing, seven of which had not been previously validated. Combining these data with non-canonical and missense splice variants reported in the literature, we identified a predictive threshold for SpliceAI and Pangolin with high sensitivity (Δ score >0.6). Moreover, we showed reduced specificity for variants with low Δ scores (<0.2), highlighting a limitation of these tools that results in the misidentification of true splice variants. These results improve genomic diagnostics for ALGS by confirming splice effects for seven variants and suggest that the integration of splice prediction tools with RNA analysis is important to ensure accurate clinical variant classifications.
Collapse
Affiliation(s)
- Ernest Keefer-Jacques
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Nicolette Valente
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Anastasia M Jacko
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Grace Matwijec
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Apsara Reese
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Aarna Tekriwal
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kathleen M Loomes
- Division of Pediatric Gastroenterology, Hepatology, and Nutrition, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Nancy B Spinner
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Melissa A Gilbert
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Pediatric Gastroenterology, Hepatology, and Nutrition, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
5
|
Sullivan PJ, Quinn JMW, Wu W, Pinese M, Cowley MJ. SpliceVarDB: A comprehensive database of experimentally validated human splicing variants. Am J Hum Genet 2024; 111:2164-2175. [PMID: 39226898 PMCID: PMC11480807 DOI: 10.1016/j.ajhg.2024.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 08/03/2024] [Accepted: 08/06/2024] [Indexed: 09/05/2024] Open
Abstract
Variants that alter gene splicing are estimated to comprise up to a third of all disease-causing variants, yet they are hard to predict from DNA sequencing data alone. To overcome this, many groups are incorporating RNA-based analyses, which are resource intensive, particularly for diagnostic laboratories. There are thousands of functionally validated variants that induce mis-splicing; however, this information is not consolidated, and they are under-represented in ClinVar, which presents a barrier to variant interpretation and can result in duplication of validation efforts. To address this issue, we developed SpliceVarDB, an online database consolidating over 50,000 variants assayed for their effects on splicing in over 8,000 human genes. We evaluated over 500 published data sources and established a spliceogenicity scale to standardize, harmonize, and consolidate variant validation data generated by a range of experimental protocols. According to the strength of their supporting evidence, variants were classified as "splice-altering" (∼25%), "not splice-altering" (∼25%), and "low-frequency splice-altering" (∼50%), which correspond to weak or indeterminate evidence of spliceogenicity. Importantly, 55% of the splice-altering variants in SpliceVarDB are outside the canonical splice sites (5.6% are deep intronic). These variants can support the variant curation diagnostic pathway and can be used to provide the high-quality data necessary to develop more accurate in silico splicing predictors. The variants are accessible through an online platform, SpliceVarDB, with additional features for visualization, variant information, in silico predictions, and validation metrics. SpliceVarDB is a very large collection of splice-altering variants and is available at https://splicevardb.org.
Collapse
Affiliation(s)
- Patricia J Sullivan
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia; UNSW Centre for Childhood Cancer Research, UNSW Sydney, Sydney, NSW, Australia
| | - Julian M W Quinn
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Weilin Wu
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Mark Pinese
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia
| | - Mark J Cowley
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia.
| |
Collapse
|
6
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2024:10.1038/s41576-024-00774-2. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
7
|
Rahim F, Tao L, Khan K, Ali I, Zeb A, Khan I, Dil S, Abbas T, Hussain A, Zubair M, Zhang H, Hui M, Khan MA, Shah W, Shi Q. A homozygous ARMC3 splicing variant causes asthenozoospermia and flagellar disorganization in a consanguineous family. Clin Genet 2024; 106:437-447. [PMID: 39221575 DOI: 10.1111/cge.14575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/25/2024] [Accepted: 06/05/2024] [Indexed: 09/04/2024]
Abstract
Male infertility due to asthenozoospermia is quite frequent, but its etiology is poorly understood. We recruited two infertile brothers, born to first-cousin parents from Pakistan, displaying idiopathic asthenozoospermia with mild stuttering disorder but no ciliary-related symptoms. Whole-exome sequencing identified a splicing variant (c.916+1G>A) in ARMC3, recessively co-segregating with asthenozoospermia in the family. The ARMC3 protein is evolutionarily highly conserved and is mostly expressed in the brain and testicular tissue of human. The ARMC3 splicing mutation leads to the exclusion of exon 8, resulting in a predicted truncated protein (p.Glu245_Asp305delfs*16). Quantitative real-time PCR revealed a significant decrease at mRNA level for ARMC3 and Western blot analysis did not detect ARMC3 protein in the patient's sperm. Individuals homozygous for the ARMC3 splicing variant displayed reduced sperm motility with frequent morphological abnormalities of sperm flagella. Transmission electron microscopy of the affected individual IV: 2 revealed vacuolation in sperm mitochondria at the midpiece and disrupted flagellar ultrastructure in the principal and end piece. Altogether, our results indicate that this novel homozygous ARMC3 splicing mutation destabilizes sperm flagella and leads to asthenozoospermia in our patients, providing a novel marker for genetic counseling and diagnosis of male infertility.
Collapse
Affiliation(s)
- Fazal Rahim
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Liu Tao
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Khalid Khan
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Imtiaz Ali
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Aurang Zeb
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Ihsan Khan
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Sobia Dil
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Tanveer Abbas
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Ansar Hussain
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Muhammad Zubair
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Huan Zhang
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Ma Hui
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Muzammil Ahmad Khan
- Gomal Centre of Biochemistry and Biotechnology, Gomal University, Dera Ismail Khan, Khyber Pakhtunkhwa, Pakistan
| | - Wasim Shah
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| | - Qinghua Shi
- Division of Reproduction and Genetics, First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, Institute of Health and Medicine, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei, China
| |
Collapse
|
8
|
O'Neill MJ, Yang T, Laudeman J, Calandranis ME, Harvey ML, Solus JF, Roden DM, Glazer AM. ParSE-seq: a calibrated multiplexed assay to facilitate the clinical classification of putative splice-altering variants. Nat Commun 2024; 15:8320. [PMID: 39333091 PMCID: PMC11437130 DOI: 10.1038/s41467-024-52474-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 09/10/2024] [Indexed: 09/29/2024] Open
Abstract
Interpreting the clinical significance of putative splice-altering variants outside canonical splice sites remains difficult without time-intensive experimental studies. To address this, we introduce Parallel Splice Effect Sequencing (ParSE-seq), a multiplexed assay to quantify variant effects on RNA splicing. We first apply this technique to study hundreds of variants in the arrhythmia-associated gene SCN5A. Variants are studied in 'minigene' plasmids with molecular barcodes to allow pooled variant effect quantification. We perform experiments in two cell types, including disease-relevant induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs). The assay strongly separates known control variants from ClinVar, enabling quantitative calibration of the ParSE-seq assay. Using these evidence strengths and experimental data, we reclassify 29 of 34 variants with conflicting interpretations and 11 of 42 variants of uncertain significance. In addition to intronic variants, we show that many synonymous and missense variants disrupted RNA splicing. Two splice-altering variants in the assay also disrupt splicing and sodium current when introduced into iPSC-CMs by CRISPR-Cas9 editing. ParSE-seq provides high-throughput experimental data for RNA-splicing to support precision medicine efforts and can be readily adopted to study other loss-of-function genotype-phenotype relationships.
Collapse
Affiliation(s)
| | - Tao Yang
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Julie Laudeman
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Maria E Calandranis
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - M Lorena Harvey
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joseph F Solus
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dan M Roden
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Andrew M Glazer
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
9
|
Yang K, Islas N, Jewell S, Jha A, Radens CM, Pleiss JA, Lynch KW, Barash Y, Choi PS. Machine learning-optimized targeted detection of alternative splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.20.614162. [PMID: 39386495 PMCID: PMC11463589 DOI: 10.1101/2024.09.20.614162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
RNA-sequencing (RNA-seq) is widely adopted for transcriptome analysis but has inherent biases which hinder the comprehensive detection and quantification of alternative splicing. To address this, we present an efficient targeted RNA-seq method that greatly enriches for splicing-informative junction-spanning reads. Local Splicing Variation sequencing (LSV-seq) utilizes multiplexed reverse transcription from highly scalable pools of primers anchored near splicing events of interest. Primers are designed using Optimal Prime, a novel machine learning algorithm trained on the performance of thousands of primer sequences. In experimental benchmarks, LSV-seq achieves high on-target capture rates and concordance with RNA-seq, while requiring significantly lower sequencing depth. Leveraging deep learning splicing code predictions, we used LSV-seq to target events with low coverage in GTEx RNA-seq data and newly discover hundreds of tissue-specific splicing events. Our results demonstrate the ability of LSV-seq to quantify splicing of events of interest at high-throughput and with exceptional sensitivity.
Collapse
Affiliation(s)
- Kevin Yang
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Division of Cancer Pathobiology, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nathaniel Islas
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - San Jewell
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Caleb M. Radens
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jeffrey A. Pleiss
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Kristen W. Lynch
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA, USA
| | - Yoseph Barash
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Peter S. Choi
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Division of Cancer Pathobiology, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| |
Collapse
|
10
|
Benegas G, Ye C, Albors C, Li JC, Song YS. Genomic Language Models: Opportunities and Challenges. ARXIV 2024:arXiv:2407.11435v2. [PMID: 39070037 PMCID: PMC11275703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and evaluating gLMs.
Collapse
Affiliation(s)
- Gonzalo Benegas
- Computer Science Division, University of California, Berkeley
| | - Chengzhong Ye
- Department of Statistics, University of California, Berkeley
| | - Carlos Albors
- Computer Science Division, University of California, Berkeley
| | - Jianan Canal Li
- Computer Science Division, University of California, Berkeley
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
| |
Collapse
|
11
|
Chao KH, Mao A, Salzberg SL, Pertea M. Splam: a deep-learning-based splice site predictor that improves spliced alignments. Genome Biol 2024; 25:243. [PMID: 39285451 PMCID: PMC11406845 DOI: 10.1186/s13059-024-03379-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 08/28/2024] [Indexed: 09/19/2024] Open
Abstract
The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, a novel method for predicting splice junctions in DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at a 400-base-pair window flanking each splice site, reflecting the biological splicing process that relies primarily on signals within this window. Splam also trains on donor and acceptor pairs together, mirroring how the splicing machinery recognizes both ends of each intron. Compared to SpliceAI, Splam is consistently more accurate, achieving 96% accuracy in predicting human splice junctions.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21211, USA.
| | - Alan Mao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205, USA
| | - Mihaela Pertea
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21211, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA.
| |
Collapse
|
12
|
Gregersen PA, Hammarsjö A, Graversen L, Brix N, Lindelöf H, Jensen UB, Farholt S, Rubak S, Bjerre J, Piticchio SG, Terkelsen T, Nishimura G, Hellfritzsch MB, Grigelioniene G. Compound heterozygosity for two variants in BMP5 in human skeletal dysostosis with atrioventricular septal defect. Clin Genet 2024. [PMID: 39239663 DOI: 10.1111/cge.14616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 08/23/2024] [Accepted: 08/26/2024] [Indexed: 09/07/2024]
Abstract
The growth and development of the skeleton is regulated by bone morphogenetic proteins of which several are linked to genetic skeletal disorders. So far, no human skeletal malformations have been associated with variants in BMP5. Here, we report a patient with biallelic loss of function variants in BMP5 and a syndromic phenotype including skeletal dysostosis, dysmorphic features, hypermobility, laryngo-tracheo-bronchomalacia and atrioventricular septal defect. We discuss the phenotype in relation to the known tissue-specific expression of Bmp5 and similar morphological abnormalities previously reported in experimental animal models. Our findings suggest a new association between BMP5 variants and a range of developmental anomalies, involving ears, heart and skeleton, thereby increasing understanding of BMP5's role in human development.
Collapse
Affiliation(s)
- Pernille Axél Gregersen
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
- Centre for Rare Diseases, Department of Paediatrics and Adolescent Medicine, AUH, Aarhus, Denmark
- Department of Clinical Medicine, Health, Aarhus University, Aarhus, Denmark
| | - Anna Hammarsjö
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm, Sweden
| | - Lise Graversen
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
| | - Nis Brix
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
- Department of Public Health, Research Unit for Epidemiology, Aarhus University, Aarhus, Denmark
| | - Hillevi Lindelöf
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm, Sweden
| | - Uffe Birk Jensen
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Stense Farholt
- Centre for Rare Diseases, Department of Paediatrics and Adolescent Medicine, AUH, Aarhus, Denmark
- Rigshospitalet, Centre for Rare Diseases, Department of Paediatrics and Adolescent Medicine, Copenhagen, Denmark
| | - Sune Rubak
- Department of Clinical Medicine, Health, Aarhus University, Aarhus, Denmark
- Department of Paediatrics and Adolescent Medicine, AUH, Aarhus, Denmark
- Center for Paediatric Pulmonology and Allergology, Department of Child and Adolescent Health, AUH, Aarhus, Denmark
| | - Jesper Bjerre
- Department of Paediatrics and Adolescent Medicine, AUH, Aarhus, Denmark
| | - Serena G Piticchio
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden
| | - Thorkild Terkelsen
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Gen Nishimura
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Radiology, Musashino-Yowakai Hospital, Tokyo, Japan
| | | | - Giedre Grigelioniene
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
13
|
Wang D, Gazzara MR, Jewell S, Wales-McGrath B, Brown CD, Choi PS, Barash Y. A Deep Dive into Statistical Modeling of RNA Splicing QTLs Reveals New Variants that Explain Neurodegenerative Disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.01.610696. [PMID: 39282456 PMCID: PMC11398334 DOI: 10.1101/2024.09.01.610696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Genome-wide association studies (GWAS) have identified thousands of putative disease causing variants with unknown regulatory effects. Efforts to connect these variants with splicing quantitative trait loci (sQTLs) have provided functional insights, yet sQTLs reported by existing methods cannot explain many GWAS signals. We show current sQTL modeling approaches can be improved by considering alternative splicing representation, model calibration, and covariate integration. We then introduce MAJIQTL, a new pipeline for sQTL discovery. MAJIQTL includes two new statistical methods: a weighted multiple testing approach for sGene discovery and a model for sQTL effect size inference to improve variant prioritization. By applying MAJIQTL to GTEx, we find significantly more sGenes harboring sQTLs with functional significance. Notably, our analysis implicates the novel variant rs582283 in Alzheimer's disease. Using antisense oligonucleotides, we validate this variant's effect by blocking the implicated YBX3 binding site, leading to exon skipping in the gene MS4A3.
Collapse
Affiliation(s)
- David Wang
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania
| | - Matthew R. Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania
| | - San Jewell
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
| | | | | | - Peter S. Choi
- Department of Pathology & Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania
- Division of Cancer Pathobiology, The Children’s Hospital of Philadelphia
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania
| |
Collapse
|
14
|
Maggi J, Feil S, Gloggnitzer J, Maggi K, Bachmann-Gagescu R, Gerth-Kahlert C, Koller S, Berger W. Nanopore Deep Sequencing as a Tool to Characterize and Quantify Aberrant Splicing Caused by Variants in Inherited Retinal Dystrophy Genes. Int J Mol Sci 2024; 25:9569. [PMID: 39273516 PMCID: PMC11395040 DOI: 10.3390/ijms25179569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 09/15/2024] Open
Abstract
The contribution of splicing variants to molecular diagnostics of inherited diseases is reported to be less than 10%. This figure is likely an underestimation due to several factors including difficulty in predicting the effect of such variants, the need for functional assays, and the inability to detect them (depending on their locations and the sequencing technology used). The aim of this study was to assess the utility of Nanopore sequencing in characterizing and quantifying aberrant splicing events. For this purpose, we selected 19 candidate splicing variants that were identified in patients affected by inherited retinal dystrophies. Several in silico tools were deployed to predict the nature and estimate the magnitude of variant-induced aberrant splicing events. Minigene assay or whole blood-derived cDNA was used to functionally characterize the variants. PCR amplification of minigene-specific cDNA or the target gene in blood cDNA, combined with Nanopore sequencing, was used to identify the resulting transcripts. Thirteen out of nineteen variants caused aberrant splicing events, including cryptic splice site activation, exon skipping, pseudoexon inclusion, or a combination of these. Nanopore sequencing allowed for the identification of full-length transcripts and their precise quantification, which were often in accord with in silico predictions. The method detected reliably low-abundant transcripts, which would not be detected by conventional strategies, such as RT-PCR followed by Sanger sequencing.
Collapse
Affiliation(s)
- Jordi Maggi
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland
| | - Silke Feil
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland
| | - Jiradet Gloggnitzer
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland
| | - Kevin Maggi
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland
| | - Ruxandra Bachmann-Gagescu
- Institute of Medical Genetics, University of Zurich, 8952 Schlieren, Switzerland
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- Neuroscience Center Zurich (ZNZ), University and ETH Zurich, 8057 Zurich, Switzerland
| | - Christina Gerth-Kahlert
- Department of Ophthalmology, University Hospital Zurich and University of Zurich, 8091 Zurich, Switzerland
| | - Samuel Koller
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland
| | - Wolfgang Berger
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland
- Neuroscience Center Zurich (ZNZ), University and ETH Zurich, 8057 Zurich, Switzerland
- Zurich Center for Integrative Human Physiology (ZIHP), University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
15
|
Xu C, Bao S, Wang Y, Li W, Chen H, Shen Y, Jiang T, Zhang C. Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences. Genome Res 2024; 34:1052-1065. [PMID: 39060028 PMCID: PMC11368187 DOI: 10.1101/gr.279044.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 07/18/2024] [Indexed: 07/28/2024]
Abstract
Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes, and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However, the current methods for the quantitative prediction of splice site usage still have limited accuracy. Here, we present DeltaSplice, a deep neural network model optimized to learn the impact of mutations on quantitative changes in alternative splicing from the comparative analysis of homologous genes. The model architecture enables DeltaSplice to perform "reference-informed prediction" by incorporating the known splice site usage of a reference gene sequence to improve its prediction on splicing-altering mutations. We benchmarked DeltaSplice and several other state-of-the-art methods on various prediction tasks, including evolutionary sequence divergence on lineage-specific splicing and splicing-altering mutations in human populations and neurodevelopmental disorders, and demonstrated that DeltaSplice outperformed consistently. DeltaSplice predicted ∼15% of splicing quantitative trait loci (sQTLs) in the human brain as causal splicing-altering variants. It also predicted splicing-altering de novo mutations outside the splice sites in a subset of patients affected by autism and other neurodevelopmental disorders (NDDs), including 19 genes with recurrent splicing-altering mutations. Integration of splicing-altering mutations with other types of de novo mutation burdens allowed the prediction of eight novel NDD-risk genes. Our work expanded the capacity of in silico splicing models with potential applications in genetic diagnosis and the development of splicing-based precision medicine.
Collapse
Affiliation(s)
- Chencheng Xu
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Suying Bao
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | - Ye Wang
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | - Wenxing Li
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA
| | - Tao Jiang
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA
| | - Chaolin Zhang
- Department of Systems Biology, Columbia University, New York, New York 10032, USA;
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| |
Collapse
|
16
|
Smail C, Montgomery SB. RNA Sequencing in Disease Diagnosis. Annu Rev Genomics Hum Genet 2024; 25:353-367. [PMID: 38360541 DOI: 10.1146/annurev-genom-021623-121812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
RNA sequencing (RNA-seq) enables the accurate measurement of multiple transcriptomic phenotypes for modeling the impacts of disease variants. Advances in technologies, experimental protocols, and analysis strategies are rapidly expanding the application of RNA-seq to identify disease biomarkers, tissue- and cell-type-specific impacts, and the spatial localization of disease-associated mechanisms. Ongoing international efforts to construct biobank-scale transcriptomic repositories with matched genomic data across diverse population groups are further increasing the utility of RNA-seq approaches by providing large-scale normative reference resources. The availability of these resources, combined with improved computational analysis pipelines, has enabled the detection of aberrant transcriptomic phenotypes underlying rare diseases. Further expansion of these resources, across both somatic and developmental tissues, is expected to soon provide unprecedented insights to resolve disease origin, mechanism of action, and causal gene contributions, suggesting the continued high utility of RNA-seq in disease diagnosis.
Collapse
Affiliation(s)
- Craig Smail
- Genomic Medicine Center, Children's Mercy Research Institute, Children's Mercy Kansas City, Kansas City, Missouri, USA;
| | - Stephen B Montgomery
- Department of Biomedical Data Science, Department of Genetics, and Department of Pathology, Stanford University School of Medicine, Stanford, California, USA;
| |
Collapse
|
17
|
Sokolova K, Chen KM, Hao Y, Zhou J, Troyanskaya OG. Deep Learning Sequence Models for Transcriptional Regulation. Annu Rev Genomics Hum Genet 2024; 25:105-122. [PMID: 38594933 DOI: 10.1146/annurev-genom-021623-024727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.
Collapse
Affiliation(s)
- Ksenia Sokolova
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| | - Kathleen M Chen
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| | - Yun Hao
- Flatiron Institute, Simons Foundation, New York, NY, USA;
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| | - Olga G Troyanskaya
- Princeton Precision Health, Princeton University, Princeton, New Jersey, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA;
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| |
Collapse
|
18
|
Gangaram B, Lee V, Slavotinek A. Biallelic OTUD6B variants associated with a Kabuki syndrome-like disorder in three siblings: A clinical report and literature review. Am J Med Genet A 2024; 194:e63567. [PMID: 38389298 DOI: 10.1002/ajmg.a.63567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/19/2024] [Accepted: 02/01/2024] [Indexed: 02/24/2024]
Abstract
Biallelic variants in the OTUD6B gene have been reported in the literature in association with an intellectual developmental disorder featuring dysmorphic facies, seizures, and distal limb abnormalities. Physical differences described for affected individuals suggest that the disorder may be clinically recognizable, but previous publications have reported an initial clinical suspicion for Kabuki syndrome (KS) in some affected individuals. Here, we report on three siblings with biallelic variants in OTUD6B co-segregating with neurodevelopmental delay, shared physical differences, and other clinical findings similar to those of previously reported individuals. However, clinical manifestations such as long palpebral fissures, prominent and cupped ears, developmental delay, growth deficiency, persistent fetal fingertip pads, vertebral anomaly, and seizures in the proband were initially suggestive of KS. In addition, previously unreported clinical manifestations such as delayed eruption of primary dentition, soft doughy skin with reduced sweating, and mirror movements present in our patients suggest an expansion of the phenotype, and we perform a literature review to update on current information related to OTUD6B and human gene-disease association.
Collapse
Affiliation(s)
- Balram Gangaram
- Division of Medical Genetics, Department of Pediatrics, University of California, San Francisco, California, USA
| | - Virgina Lee
- Division of Child Neurology, University of California, San Francisco, California, USA
| | - Anne Slavotinek
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| |
Collapse
|
19
|
Sun KY, Bai X, Chen S, Bao S, Zhang C, Kapoor M, Backman J, Joseph T, Maxwell E, Mitra G, Gorovits A, Mansfield A, Boutkov B, Gokhale S, Habegger L, Marcketta A, Locke AE, Ganel L, Hawes A, Kessler MD, Sharma D, Staples J, Bovijn J, Gelfman S, Di Gioia A, Rajagopal VM, Lopez A, Varela JR, Alegre-Díaz J, Berumen J, Tapia-Conyer R, Kuri-Morales P, Torres J, Emberson J, Collins R, Cantor M, Thornton T, Kang HM, Overton JD, Shuldiner AR, Cremona ML, Nafde M, Baras A, Abecasis G, Marchini J, Reid JG, Salerno W, Balasubramanian S. A deep catalogue of protein-coding variation in 983,578 individuals. Nature 2024; 631:583-592. [PMID: 38768635 PMCID: PMC11254753 DOI: 10.1038/s41586-024-07556-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 05/10/2024] [Indexed: 05/22/2024]
Abstract
Rare coding variants that substantially affect function provide insights into the biology of a gene1-3. However, ascertaining the frequency of such variants requires large sample sizes4-8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
Collapse
Affiliation(s)
| | | | - Siying Chen
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Suying Bao
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Liron Ganel
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | - Jesús Alegre-Díaz
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Jaime Berumen
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Roberto Tapia-Conyer
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Pablo Kuri-Morales
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
- Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico
| | - Jason Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan Emberson
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Rory Collins
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | - Mona Nafde
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | |
Collapse
|
20
|
Maggi J, Koller S, Feil S, Bachmann-Gagescu R, Gerth-Kahlert C, Berger W. Limited Added Diagnostic Value of Whole Genome Sequencing in Genetic Testing of Inherited Retinal Diseases in a Swiss Patient Cohort. Int J Mol Sci 2024; 25:6540. [PMID: 38928247 PMCID: PMC11203445 DOI: 10.3390/ijms25126540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 06/28/2024] Open
Abstract
The purpose of this study was to assess the added diagnostic value of whole genome sequencing (WGS) for patients with inherited retinal diseases (IRDs) who remained undiagnosed after whole exome sequencing (WES). WGS was performed for index patients in 66 families. The datasets were analyzed according to GATK's guidelines. Additionally, DeepVariant was complemented by GATK's workflow, and a novel structural variant pipeline was developed. Overall, a molecular diagnosis was established in 19/66 (28.8%) index patients. Pathogenic deletions and one deep-intronic variant contributed to the diagnostic yield in 4/19 and 1/19 index patients, respectively. The remaining diagnoses (14/19) were attributed to exonic variants that were missed during WES analysis due to bioinformatic limitations, newly described loci, or unclear pathogenicity. The added diagnostic value of WGS equals 5/66 (9.6%) for our cohort, which is comparable to previous studies. This figure would decrease further to 1/66 (1.5%) with a standardized and reliable copy number variant workflow during WES analysis. Given the higher costs and limited added value, the implementation of WGS as a first-tier assay for inherited eye disorders in a diagnostic laboratory remains untimely. Instead, progress in bioinformatic tools and communication between diagnostic and clinical teams have the potential to ameliorate diagnostic yields.
Collapse
Affiliation(s)
- Jordi Maggi
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland; (J.M.); (S.K.); (S.F.)
| | - Samuel Koller
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland; (J.M.); (S.K.); (S.F.)
| | - Silke Feil
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland; (J.M.); (S.K.); (S.F.)
| | | | - Christina Gerth-Kahlert
- Department of Ophthalmology, University Hospital Zurich and University of Zurich, 8091 Zurich, Switzerland;
| | - Wolfgang Berger
- Institute of Medical Molecular Genetics, University of Zurich, 8952 Schlieren, Switzerland; (J.M.); (S.K.); (S.F.)
- Zurich Center for Integrative Human Physiology (ZIHP), University of Zurich, 8057 Zurich, Switzerland
- Neuroscience Center Zurich (ZNZ), University and ETH Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
21
|
Shchagina O, Murtazina A, Chausova P, Orlova M, Dadali E, Kurbatov S, Kutsev S, Polyakov A. Genetic Landscape of SH3TC2 variants in Russian patients with Charcot-Marie-Tooth disease. Front Genet 2024; 15:1381915. [PMID: 38903759 PMCID: PMC11187259 DOI: 10.3389/fgene.2024.1381915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open
Abstract
Introduction Charcot-Marie-Tooth disease type 4C (CMT4C) OMIM#601596 stands out as one of the most prevalent forms of recessive motor sensory neuropathy worldwide. This disorder results from biallelic pathogenic variants in the SH3TC2 gene. Methods Within a cohort comprising 700 unrelated Russian patients diagnosed with Charcot-Marie-Tooth disease, we conducted a gene panel analysis encompassing 21 genes associated with hereditary neuropathies. Among the cohort, 394 individuals exhibited demyelinating motor and sensory neuropathy. Results and discussion Notably, 10 cases of CMT4C were identified within this cohort. The prevalence of CMT4C among Russian demyelinating CMT patients lacking the PMP22 duplication is estimated at 2.5%, significantly differing from observations in European populations. In total, 4 novel and 9 previously reported variants in the SH3TC2 gene were identified. No accumulation of a major variant was detected. Three previously reported variants, c.2860C>T p. (Arg954*), p. (Arg658Cys) and c.279G>A p. (Lys93Lys), recurrently detected in unrelated families. Nucleotide alteration p. (Arg954*) is present in most of our patients (30%).
Collapse
Affiliation(s)
| | | | | | - Mariya Orlova
- Research Centre for Medical Genetics, Moscow, Russia
| | - Elena Dadali
- Research Centre for Medical Genetics, Moscow, Russia
| | - Sergei Kurbatov
- Research Institute of Experimental Biology and Medicine, Voronezh State Medical University named After N.N. Burdenko, Voronezh, Russia
- Saratov State Medical University, Saratov, Russia
| | - Sergey Kutsev
- Research Centre for Medical Genetics, Moscow, Russia
| | | |
Collapse
|
22
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
23
|
Zeitz C, Navarro J, Azizzadeh Pormehr L, Méjécase C, Neves LM, Letellier C, Condroyer C, Albadri S, Amprou A, Antonio A, Ben-Yacoub T, Wohlschlegel J, Andrieu C, Serafini M, Bianco L, Antropoli A, Nassisi M, El Shamieh S, Chantot-Bastaraud S, Mohand-Saïd S, Smirnov V, Sahel JA, Del Bene F, Audo I. Variants in UBAP1L lead to autosomal recessive rod-cone and cone-rod dystrophy. Genet Med 2024; 26:101081. [PMID: 38293907 DOI: 10.1016/j.gim.2024.101081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/16/2024] [Accepted: 01/19/2024] [Indexed: 02/01/2024] Open
Abstract
PURPOSE Progressive inherited retinal degenerations (IRDs) affecting rods and cones are clinically and genetically heterogeneous and can lead to blindness with limited therapeutic options. The major gene defects have been identified in subjects of European and Asian descent with only few reports of North African descent. METHODS Genome, targeted next-generation, and Sanger sequencing was applied to cohort of ∼4000 IRDs cases. Expression analyses were performed including Chip-seq database analyses, on human-derived retinal organoids (ROs), retinal pigment epithelium cells, and zebrafish. Variants' pathogenicity was accessed using 3D-modeling and/or ROs. RESULTS Here, we identified a novel gene defect with three distinct pathogenic variants in UBAP1L in 4 independent autosomal recessive IRD cases from Tunisia. UBAP1L is expressed in the retinal pigment epithelium and retina, specifically in rods and cones, in line with the phenotype. It encodes Ubiquitin-associated protein 1-like, containing a solenoid of overlapping ubiquitin-associated domain, predicted to interact with ubiquitin. In silico and in vitro studies, including 3D-modeling and ROs revealed that the solenoid of overlapping ubiquitin-associated domain is truncated and thus ubiquitin binding most likely abolished secondary to all variants identified herein. CONCLUSION Biallelic UBAP1L variants are a novel cause of IRDs, most likely enriched in the North African population.
Collapse
Affiliation(s)
- Christina Zeitz
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France.
| | - Julien Navarro
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Leila Azizzadeh Pormehr
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Mass. Eye and Ear, Ocular Genomics Institute, Berman-Gund Laboratory for the Study of Retinal Degenerations, Harvard Medical School, Boston, MA
| | - Cécile Méjécase
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; UCL Institute of Ophthalmology, London, UK; The Francis Crick Institute, London, UK
| | - Luiza M Neves
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Instituto Nacional de Saúde da Mulher, da Criança e do Adolescente Fernandes Figueira, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Camille Letellier
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | | | - Shahad Albadri
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Andréa Amprou
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Aline Antonio
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Tasnim Ben-Yacoub
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Juliette Wohlschlegel
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Department of Biological Structure, University of Washington, Seattle, WA
| | - Camille Andrieu
- Centre Hospitalier National d'Ophtalmologie des Quinze-Vingts, Centre de Référence Maladies Rares REFERET and INSERM-DGOS CIC 1423, Paris, France
| | - Malo Serafini
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Lorenzo Bianco
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Department of Ophthalmology, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Alessio Antropoli
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Department of Ophthalmology, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Marco Nassisi
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
| | - Said El Shamieh
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Molecular Testing Laboratory, Department of Medical Laboratory Technology, Faculty of Health Sciences, Beirut Arab University, Beirut, Lebanon
| | - Sandra Chantot-Bastaraud
- APHP, Hôpital Armand-Trousseau, Département de Génétique, UF de Génétique Chromosomique, Paris, France
| | - Saddek Mohand-Saïd
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Centre Hospitalier National d'Ophtalmologie des Quinze-Vingts, Centre de Référence Maladies Rares REFERET and INSERM-DGOS CIC 1423, Paris, France
| | - Vasily Smirnov
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Exploration de la Vision et Neuro-Ophtalmologie, CHU de Lille, Lille, France
| | - José-Alain Sahel
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Centre Hospitalier National d'Ophtalmologie des Quinze-Vingts, Centre de Référence Maladies Rares REFERET and INSERM-DGOS CIC 1423, Paris, France; Department of Ophthalmology, The University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Filippo Del Bene
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Isabelle Audo
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France; Centre Hospitalier National d'Ophtalmologie des Quinze-Vingts, Centre de Référence Maladies Rares REFERET and INSERM-DGOS CIC 1423, Paris, France.
| |
Collapse
|
24
|
Duman ET, Sitte M, Conrads K, Mackay A, Ludewig F, Ströbel P, Ellenrieder V, Hessmann E, Papantonis A, Salinas G. A single-cell strategy for the identification of intronic variants related to mis-splicing in pancreatic cancer. NAR Genom Bioinform 2024; 6:lqae057. [PMID: 38800828 PMCID: PMC11127633 DOI: 10.1093/nargab/lqae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/24/2024] [Accepted: 05/23/2024] [Indexed: 05/29/2024] Open
Abstract
Most clinical diagnostic and genomic research setups focus almost exclusively on coding regions and essential splice sites, thereby overlooking other non-coding variants. As a result, intronic variants that can promote mis-splicing events across a range of diseases, including cancer, are yet to be systematically investigated. Such investigations would require both genomic and transcriptomic data, but there currently exist very few datasets that satisfy these requirements. We address this by developing a single-nucleus full-length RNA-sequencing approach that allows for the detection of potentially pathogenic intronic variants. We exemplify the potency of our approach by applying pancreatic cancer tumor and tumor-derived specimens and linking intronic variants to splicing dysregulation. We specifically find that prominent intron retention and pseudo-exon activation events are shared by the tumors and affect genes encoding key transcriptional regulators. Our work paves the way for the assessment and exploitation of intronic mutations as powerful prognostic markers and potential therapeutic targets in cancer.
Collapse
Affiliation(s)
- Emre Taylan Duman
- NGS-Core Unit for Integrative Genomics, Institute of Pathology, University Medical Center, Göttingen, Germany
| | - Maren Sitte
- NGS-Core Unit for Integrative Genomics, Institute of Pathology, University Medical Center, Göttingen, Germany
| | - Karly Conrads
- Clinic of Gastroenterology, Gastrointestinal Oncology and Endocrinology, University Medical Center, Göttingen, Germany
- Clinical Research Unit 5002 (CRU5002), University Medical Center, Göttingen, Germany
- Institute of Medical Bioinformatics, University Medical Center, Göttingen, Germany
| | - Adi Mackay
- Clinical Research Unit 5002 (CRU5002), University Medical Center, Göttingen, Germany
- Institute of Pathology, University Medical Center, Göttingen, Germany
| | - Fabian Ludewig
- NGS-Core Unit for Integrative Genomics, Institute of Pathology, University Medical Center, Göttingen, Germany
| | - Philipp Ströbel
- Clinical Research Unit 5002 (CRU5002), University Medical Center, Göttingen, Germany
- Institute of Pathology, University Medical Center, Göttingen, Germany
| | - Volker Ellenrieder
- Clinic of Gastroenterology, Gastrointestinal Oncology and Endocrinology, University Medical Center, Göttingen, Germany
- Clinical Research Unit 5002 (CRU5002), University Medical Center, Göttingen, Germany
- Comprehensive Cancer Center Lower Saxony (CCC-N), Göttingen, Germany
| | - Elisabeth Hessmann
- Clinic of Gastroenterology, Gastrointestinal Oncology and Endocrinology, University Medical Center, Göttingen, Germany
- Clinical Research Unit 5002 (CRU5002), University Medical Center, Göttingen, Germany
- Comprehensive Cancer Center Lower Saxony (CCC-N), Göttingen, Germany
| | - Argyris Papantonis
- Clinical Research Unit 5002 (CRU5002), University Medical Center, Göttingen, Germany
- Institute of Pathology, University Medical Center, Göttingen, Germany
- Comprehensive Cancer Center Lower Saxony (CCC-N), Göttingen, Germany
| | - Gabriela Salinas
- NGS-Core Unit for Integrative Genomics, Institute of Pathology, University Medical Center, Göttingen, Germany
- Clinical Research Unit 5002 (CRU5002), University Medical Center, Göttingen, Germany
| |
Collapse
|
25
|
Grätz C, Schuster M, Brandes F, Meidert AS, Kirchner B, Reithmair M, Schelling G, Pfaffl MW. A pipeline for the development and analysis of extracellular vesicle-based transcriptomic biomarkers in molecular diagnostics. Mol Aspects Med 2024; 97:101269. [PMID: 38552453 DOI: 10.1016/j.mam.2024.101269] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/11/2024] [Accepted: 03/17/2024] [Indexed: 06/12/2024]
Abstract
Extracellular vesicles are shed by every cell type and can be found in any biofluid. They contain different molecules that can be utilized as biomarkers, including several RNA species which they protect from degradation. Here, we present a pipeline for the development and analysis of extracellular vesicle-associated transcriptomic biomarkers that our group has successfully applied multiple times. We highlight the key steps of the pipeline and give particular emphasis to the necessary quality control checkpoints, which are linked to numerous available guidelines that should be considered along the workflow. Our pipeline starts with patient recruitment and continues with blood sampling and processing. The purification and characterization of extracellular vesicles is explained in detail, as well as the isolation and quality control of extracellular vesicle-associated RNA. We point out the possible pitfalls during library preparation and RNA sequencing and present multiple bioinformatic tools to pinpoint biomarker signature candidates from the sequencing data. Finally, considerations and pitfalls during the validation of the biomarker signature using RT-qPCR will be elaborated.
Collapse
Affiliation(s)
- Christian Grätz
- Department of Animal Physiology and Immunology, School of Life Sciences, Technical University of Munich, Freising, Germany.
| | - Martina Schuster
- Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Florian Brandes
- Department of Anesthesiology, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Agnes S Meidert
- Department of Anesthesiology, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Benedikt Kirchner
- Department of Animal Physiology and Immunology, School of Life Sciences, Technical University of Munich, Freising, Germany; Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Marlene Reithmair
- Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Gustav Schelling
- Department of Anesthesiology, University Hospital, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Michael W Pfaffl
- Department of Animal Physiology and Immunology, School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
26
|
Majeres LE, Dilger AC, Shike DW, McCann JC, Beever JE. Defining a Haplotype Encompassing the LCORL-NCAPG Locus Associated with Increased Lean Growth in Beef Cattle. Genes (Basel) 2024; 15:576. [PMID: 38790206 PMCID: PMC11121065 DOI: 10.3390/genes15050576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 04/23/2024] [Accepted: 04/28/2024] [Indexed: 05/26/2024] Open
Abstract
Numerous studies have shown genetic variation at the LCORL-NCAPG locus is strongly associated with growth traits in beef cattle. However, a causative molecular variant has yet to be identified. To define all possible candidate variants, 34 Charolais-sired calves were whole-genome sequenced, including 17 homozygous for a long-range haplotype associated with increased growth (QQ) and 17 homozygous for potential ancestral haplotypes for this region (qq). The Q haplotype was refined to an 814 kb region between chr6:37,199,897-38,014,080 and contained 218 variants not found in qq individuals. These variants include an insertion in an intron of NCAPG, a previously documented mutation in NCAPG (rs109570900), two coding sequence mutations in LCORL (rs109696064 and rs384548488), and 15 variants located within ATAC peaks that were predicted to affect transcription factor binding. Notably, rs384548488 is a frameshift variant likely resulting in loss of function for long isoforms of LCORL. To test the association of the coding sequence variants of LCORL with phenotype, 405 cattle from five populations were genotyped. The two variants were in complete linkage disequilibrium. Statistical analysis of the three populations that contained QQ animals revealed significant (p < 0.05) associations with genotype and birth weight, live weight, carcass weight, hip height, and average daily gain. These findings affirm the link between this locus and growth in beef cattle and describe DNA variants that define the haplotype. However, further studies will be required to define the true causative mutation.
Collapse
Affiliation(s)
- Leif E. Majeres
- UTIA Genomics Center for the Advancement of Agriculture, Institute of Agriculture, University of Tennessee, Knoxville, TN 37996, USA;
| | - Anna C. Dilger
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; (A.C.D.); (D.W.S.); (J.C.M.)
| | - Daniel W. Shike
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; (A.C.D.); (D.W.S.); (J.C.M.)
| | - Joshua C. McCann
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; (A.C.D.); (D.W.S.); (J.C.M.)
| | - Jonathan E. Beever
- UTIA Genomics Center for the Advancement of Agriculture, Institute of Agriculture, University of Tennessee, Knoxville, TN 37996, USA;
| |
Collapse
|
27
|
Xu C, Bao S, Chen H, Jiang T, Zhang C. Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.22.586363. [PMID: 38586002 PMCID: PMC10996483 DOI: 10.1101/2024.03.22.586363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However, the current methods for the quantitative prediction of splice site usage still have limited accuracy. Here, we present DeltaSplice, a deep neural network model optimized to learn the impact of mutations on quantitative changes in alternative splicing from the comparative analysis of homologous genes. The model architecture enables DeltaSplice to perform "reference-informed prediction" by incorporating the known splice site usage of a reference gene sequence to improve its prediction on splicing-altering mutations. We benchmarked DeltaSplice and several other state-of-the-art methods on various prediction tasks, including evolutionary sequence divergence on lineage-specific splicing and splicing-altering mutations in human populations and neurodevelopmental disorders, and demonstrated that DeltaSplice outperformed consistently. DeltaSplice predicted ~15% of splicing quantitative trait loci (sQTLs) in the human brain as causal splicing-altering variants. It also predicted splicing-altering de novo mutations outside the splice sites in a subset of patients affected by autism and other neurodevelopmental disorders, including 19 genes with recurrent splicing-altering mutations. Among the new candidate disease risk genes, MFN1 is involved in mitochondria fusion, which is frequently disrupted in autism patients. Our work expanded the capacity of in silico splicing models with potential applications in genetic diagnosis and the development of splicing-based precision medicine.
Collapse
Affiliation(s)
- Chencheng Xu
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Present address: Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Suying Bao
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Present address: Regeneron Pharmaceuticals, Terrytown, NY 10591, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
- Present address: Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Tao Jiang
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | - Chaolin Zhang
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
28
|
Shiozawa Y, Fujita S, Nannya Y, Ogawa S, Nomura N, Kiguchi T, Sezaki N, Kudo H, Toyama T. First report of familial mixed phenotype acute leukemia: shared clinical characteristics, Philadelphia translocation, and germline variants. Int J Hematol 2024; 119:465-471. [PMID: 38424413 DOI: 10.1007/s12185-024-03724-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 01/17/2024] [Accepted: 01/25/2024] [Indexed: 03/02/2024]
Abstract
While our understanding of the molecular basis of mixed phenotype acute leukemia (MPAL) has progressed over the decades, our knowledge is limited and the prognosis remains poor. Investigating cases of familial leukemia can provide insights into the role of genetic and environmental factors in leukemogenesis. Although familial cases and associated mutations have been identified in some leukemias, familial occurrence of MPAL has never been reported. Here, we report the first cases of MPAL in a family. A 68-year-old woman was diagnosed with MPAL and received haploidentical stem cell transplantation from her 44-year-old son. In four years, the son himself developed MPAL. Both cases exhibited similar characteristics such as biphenotypic leukemia with B/myeloid cell antigens, Philadelphia translocation (BCR-ABL1 mutation), and response to acute lymphoblastic leukemia-type chemotherapy. These similarities suggest the presence of hereditary factors contributing to the development of MPAL. Targeted sequencing identified shared germline variants in these cases; however, in silico analyses did not strongly support their pathogenicity. Intriguingly, when the son developed MPAL, the mother did not develop donor-derived leukemia and remained in remission. Our cases provide valuable insights to guide future research on familial MPAL.
Collapse
Affiliation(s)
- Yuka Shiozawa
- Department of Hematology, Federation of National Public Service Personnel Mutual Aid Associations Tachikawa Hospital, 4-2-22 Nishiki-Cho, Tachikawa-Shi, Tokyo, 190-8531, Japan
| | - Shinya Fujita
- Department of Hematology, Federation of National Public Service Personnel Mutual Aid Associations Tachikawa Hospital, 4-2-22 Nishiki-Cho, Tachikawa-Shi, Tokyo, 190-8531, Japan.
| | - Yasuhito Nannya
- Department of Pathology and Tumor Biology, Kyoto University, Kyoto, Japan
- Division of Hematopoietic Disease Control, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Seishi Ogawa
- Department of Pathology and Tumor Biology, Kyoto University, Kyoto, Japan
| | - Naho Nomura
- Department of Hematology, Chugoku Central Hospital of Japan Mutual Aid Association of Public School Teachers, Hiroshima, Japan
| | - Toru Kiguchi
- Saitama Medical Center, Department of Diabetes, Endocrinology and Hematology, Dokkyo Medical University, Saitama, Japan
| | - Nobuo Sezaki
- Department of Hematology, Chugoku Central Hospital of Japan Mutual Aid Association of Public School Teachers, Hiroshima, Japan
| | - Himari Kudo
- Department of Hematology, Federation of National Public Service Personnel Mutual Aid Associations Tachikawa Hospital, 4-2-22 Nishiki-Cho, Tachikawa-Shi, Tokyo, 190-8531, Japan
| | - Takaaki Toyama
- Department of Hematology, Federation of National Public Service Personnel Mutual Aid Associations Tachikawa Hospital, 4-2-22 Nishiki-Cho, Tachikawa-Shi, Tokyo, 190-8531, Japan
| |
Collapse
|
29
|
Chen K, Zhou Y, Ding M, Wang Y, Ren Z, Yang Y. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief Bioinform 2024; 25:bbae163. [PMID: 38605640 PMCID: PMC11009468 DOI: 10.1093/bib/bbae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/22/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.
Collapse
Affiliation(s)
- Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yue Zhou
- Peng Cheng Laboratory, Shenzhen, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yu Wang
- Peng Cheng Laboratory, Shenzhen, China
| | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China
| |
Collapse
|
30
|
Lappalainen T, Li YI, Ramachandran S, Gusev A. Genetic and molecular architecture of complex traits. Cell 2024; 187:1059-1075. [PMID: 38428388 DOI: 10.1016/j.cell.2024.01.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/20/2023] [Accepted: 01/16/2024] [Indexed: 03/03/2024]
Abstract
Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.
Collapse
Affiliation(s)
- Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Yang I Li
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Sohini Ramachandran
- Ecology, Evolution and Organismal Biology, Center for Computational Molecular Biology, and the Data Science Institute, Brown University, Providence, RI 029129, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
31
|
Kawakami R, Hiraide T, Watanabe K, Miyamoto S, Hira K, Komatsu K, Ishigaki H, Sakaguchi K, Maekawa M, Yamashita K, Fukuda T, Miyairi I, Ogata T, Saitsu H. RNA sequencing and target long-read sequencing reveal an intronic transposon insertion causing aberrant splicing. J Hum Genet 2024; 69:91-99. [PMID: 38102195 DOI: 10.1038/s10038-023-01211-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/28/2023] [Accepted: 12/01/2023] [Indexed: 12/17/2023]
Abstract
More than half of cases with suspected genetic disorders remain unsolved by genetic analysis using short-read sequencing such as exome sequencing (ES) and genome sequencing (GS). RNA sequencing (RNA-seq) and long-read sequencing (LRS) are useful for interpretation of candidate variants and detection of structural variants containing repeat sequences, respectively. Recently, adaptive sampling on nanopore sequencers enables target LRS more easily. Here, we present a Japanese girl with premature chromatid separation (PCS)/mosaic variegated aneuploidy (MVA) syndrome. ES detected a known pathogenic maternal heterozygous variant (c.1402-5A>G) in intron 10 of BUB1B (NM_001211.6), a known responsive gene for PCS/MVA syndrome with autosomal recessive inheritance. Minigene splicing assay revealed that almost all transcripts from the c.1402-5G allele have mis-splicing with 4-bp insertion. GS could not detect another pathogenic variant, while RNA-seq revealed abnormal reads in intron 2. To extensively explore variants in intron 2, we performed adaptive sampling and identified a paternal 3.0 kb insertion. Consensus sequence of 16 reads spanning the insertion showed that the insertion consists of Alu and SVA elements. Realignment of RNA-seq reads to the new reference sequence containing the insertion revealed that 16 reads have 5' splice site within the insertion and 3' splice site at exon 3, demonstrating causal relationship between the insertion and aberrant splicing. In addition, immunoblotting showed severely diminished BUB1B protein level in patient derived cells. These data suggest that detection of transcriptomic abnormalities by RNA-seq can be a clue for identifying pathogenic variants, and determination of insert sequences is one of merits of LRS.
Collapse
Affiliation(s)
- Ryota Kawakami
- Department of Pediatrics, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Takuya Hiraide
- Department of Pediatrics, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Kazuki Watanabe
- Department of Biochemistry, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Sachiko Miyamoto
- Department of Biochemistry, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Kota Hira
- Department of Pediatrics, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Kazuyuki Komatsu
- Department of Pediatrics, Hamamatsu University School of Medicine, Hamamatsu, Japan
- Department of Biochemistry, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Hidetoshi Ishigaki
- Department of Pediatrics, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Kimiyoshi Sakaguchi
- Department of Pediatrics, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Masato Maekawa
- Department of Laboratory Medicine, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Keita Yamashita
- Department of Laboratory Medicine, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Tokiko Fukuda
- Department of Hamamatsu Child Health and Developmental Medicine, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Isao Miyairi
- Department of Pediatrics, Hamamatsu University School of Medicine, Hamamatsu, Japan
| | - Tsutomu Ogata
- Department of Biochemistry, Hamamatsu University School of Medicine, Hamamatsu, Japan
- Department of Pediatrics, Hamamatsu Medical Center, Hamamatsu, Japan
| | - Hirotomo Saitsu
- Department of Biochemistry, Hamamatsu University School of Medicine, Hamamatsu, Japan.
| |
Collapse
|
32
|
Gupta K, Yang C, McCue K, Bastani O, Sharp PA, Burge CB, Solar-Lezama A. Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing. Genome Biol 2024; 25:23. [PMID: 38229106 PMCID: PMC10790492 DOI: 10.1186/s13059-023-03162-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/28/2023] [Indexed: 01/18/2024] Open
Abstract
Sequence-specific RNA-binding proteins (RBPs) play central roles in splicing decisions. Here, we describe a modular splicing architecture that leverages in vitro-derived RNA affinity models for 79 human RBPs and the annotated human genome to produce improved models of RBP binding and activity. Binding and activity are modeled by separate Motif and Aggregator components that can be mixed and matched, enforcing sparsity to improve interpretability. Training a new Adjusted Motif (AM) architecture on the splicing task not only yields better splicing predictions but also improves prediction of RBP-binding sites in vivo and of splicing activity, assessed using independent data.
Collapse
Affiliation(s)
- Kavi Gupta
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Chenxi Yang
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA
| | - Kayla McCue
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Osbert Bastani
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Phillip A Sharp
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Armando Solar-Lezama
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
33
|
Venema WJ, Hiddingh S, van Loosdregt J, Bowes J, Balliu B, de Boer JH, Ossewaarde-van Norel J, Thompson SD, Langefeld CD, de Ligt A, van der Veken LT, Krijger PHL, de Laat W, Kuiper JJW. A cis-regulatory element regulates ERAP2 expression through autoimmune disease risk SNPs. CELL GENOMICS 2024; 4:100460. [PMID: 38190099 PMCID: PMC10794781 DOI: 10.1016/j.xgen.2023.100460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 10/04/2023] [Accepted: 11/09/2023] [Indexed: 01/09/2024]
Abstract
Single-nucleotide polymorphisms (SNPs) near the ERAP2 gene are associated with various autoimmune conditions, as well as protection against lethal infections. Due to high linkage disequilibrium, numerous trait-associated SNPs are correlated with ERAP2 expression; however, their functional mechanisms remain unidentified. We show by reciprocal allelic replacement that ERAP2 expression is directly controlled by the splice region variant rs2248374. However, disease-associated variants in the downstream LNPEP gene promoter are independently associated with ERAP2 expression. Allele-specific conformation capture assays revealed long-range chromatin contacts between the gene promoters of LNPEP and ERAP2 and showed that interactions were stronger in patients carrying the alleles that increase susceptibility to autoimmune diseases. Replacing the SNPs in the LNPEP promoter by reference sequences lowered ERAP2 expression. These findings show that multiple SNPs act in concert to regulate ERAP2 expression and that disease-associated variants can convert a gene promoter region into a potent enhancer of a distal gene.
Collapse
Affiliation(s)
- Wouter J Venema
- Department of Ophthalmology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands; Center for Translational Immunology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Sanne Hiddingh
- Department of Ophthalmology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands; Center for Translational Immunology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Jorg van Loosdregt
- Center for Translational Immunology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - John Bowes
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - Brunilda Balliu
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Joke H de Boer
- Department of Ophthalmology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | | | - Susan D Thompson
- Department of Pediatrics, University of Cincinnati College of Medicine, Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Carl D Langefeld
- Department of Biostatistics and Data Science, and Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Aafke de Ligt
- Department of Ophthalmology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands; Center for Translational Immunology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Lars T van der Veken
- Department of Genetics, Division Laboratories, Pharmacy and Biomedical Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Peter H L Krijger
- Oncode Institute, Hubrecht Institute-KNAW and University Medical Center Utrecht, 3584 CT Utrecht, the Netherlands
| | - Wouter de Laat
- Oncode Institute, Hubrecht Institute-KNAW and University Medical Center Utrecht, 3584 CT Utrecht, the Netherlands
| | - Jonas J W Kuiper
- Department of Ophthalmology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands; Center for Translational Immunology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands.
| |
Collapse
|
34
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
35
|
Humphrey J, Brophy E, Kosoy R, Zeng B, Coccia E, Mattei D, Ravi A, Efthymiou AG, Navarro E, Muller BZ, Snijders GJLJ, Allan A, Münch A, Kitata RB, Kleopoulos SP, Argyriou S, Shao Z, Francoeur N, Tsai CF, Gritsenko MA, Monroe ME, Paurus VL, Weitz KK, Shi T, Sebra R, Liu T, de Witte LD, Goate AM, Bennett DA, Haroutunian V, Hoffman GE, Fullard JF, Roussos P, Raj T. Long-read RNA-seq atlas of novel microglia isoforms elucidates disease-associated genetic regulation of splicing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.01.23299073. [PMID: 38076956 PMCID: PMC10705658 DOI: 10.1101/2023.12.01.23299073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Microglia, the innate immune cells of the central nervous system, have been genetically implicated in multiple neurodegenerative diseases. We previously mapped the genetic regulation of gene expression and mRNA splicing in human microglia, identifying several loci where common genetic variants in microglia-specific regulatory elements explain disease risk loci identified by GWAS. However, identifying genetic effects on splicing has been challenging due to the use of short sequencing reads to identify causal isoforms. Here we present the isoform-centric microglia genomic atlas (isoMiGA) which leverages the power of long-read RNA-seq to identify 35,879 novel microglia isoforms. We show that the novel microglia isoforms are involved in stimulation response and brain region specificity. We then quantified the expression of both known and novel isoforms in a multi-ethnic meta-analysis of 555 human microglia short-read RNA-seq samples from 391 donors, the largest to date, and found associations with genetic risk loci in Alzheimer's disease and Parkinson's disease. We nominate several loci that may act through complex changes in isoform and splice site usage.
Collapse
Affiliation(s)
- Jack Humphrey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Erica Brophy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Roman Kosoy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Biao Zeng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Elena Coccia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniele Mattei
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashvin Ravi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Anastasia G. Efthymiou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elisa Navarro
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Biochemistry and Molecular Biology, Faculty of Medicine (Universidad Complutense de Madrid), Madrid, Spain
- Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain
- Instituto Ramon y Cajal de Investigacion Sanitaria (IRYCIS), Madrid, Spain
| | - Benjamin Z. Muller
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gijsje JLJ Snijders
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Amanda Allan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexandra Münch
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Reta Birhanu Kitata
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Steven P Kleopoulos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Stathis Argyriou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Zhiping Shao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Nancy Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chia-Feng Tsai
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Vanessa L Paurus
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Karl K Weitz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tujin Shi
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Lot D. de Witte
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alison M. Goate
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, USA
| | - Vahram Haroutunian
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Gabriel E. Hoffman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - John F. Fullard
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Panos Roussos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Towfique Raj
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
36
|
Dwivedi SL, Quiroz LF, Reddy ASN, Spillane C, Ortiz R. Alternative Splicing Variation: Accessing and Exploiting in Crop Improvement Programs. Int J Mol Sci 2023; 24:15205. [PMID: 37894886 PMCID: PMC10607462 DOI: 10.3390/ijms242015205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/09/2023] [Accepted: 10/10/2023] [Indexed: 10/29/2023] Open
Abstract
Alternative splicing (AS) is a gene regulatory mechanism modulating gene expression in multiple ways. AS is prevalent in all eukaryotes including plants. AS generates two or more mRNAs from the precursor mRNA (pre-mRNA) to regulate transcriptome complexity and proteome diversity. Advances in next-generation sequencing, omics technology, bioinformatics tools, and computational methods provide new opportunities to quantify and visualize AS-based quantitative trait variation associated with plant growth, development, reproduction, and stress tolerance. Domestication, polyploidization, and environmental perturbation may evolve novel splicing variants associated with agronomically beneficial traits. To date, pre-mRNAs from many genes are spliced into multiple transcripts that cause phenotypic variation for complex traits, both in model plant Arabidopsis and field crops. Cataloguing and exploiting such variation may provide new paths to enhance climate resilience, resource-use efficiency, productivity, and nutritional quality of staple food crops. This review provides insights into AS variation alongside a gene expression analysis to select for novel phenotypic diversity for use in breeding programs. AS contributes to heterosis, enhances plant symbiosis (mycorrhiza and rhizobium), and provides a mechanistic link between the core clock genes and diverse environmental clues.
Collapse
Affiliation(s)
| | - Luis Felipe Quiroz
- Agriculture and Bioeconomy Research Centre, Ryan Institute, University of Galway, University Road, H91 REW4 Galway, Ireland
| | - Anireddy S N Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Charles Spillane
- Agriculture and Bioeconomy Research Centre, Ryan Institute, University of Galway, University Road, H91 REW4 Galway, Ireland
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, 23053 Alnarp, SE, Sweden
| |
Collapse
|
37
|
Kurosawa R, Iida K, Ajiro M, Awaya T, Yamada M, Kosaki K, Hagiwara M. PDIVAS: Pathogenicity predictor for Deep-Intronic Variants causing Aberrant Splicing. BMC Genomics 2023; 24:601. [PMID: 37817060 PMCID: PMC10563346 DOI: 10.1186/s12864-023-09645-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/01/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND Deep-intronic variants that alter RNA splicing were ineffectively evaluated in the search for the cause of genetic diseases. Determination of such pathogenic variants from a vast number of deep-intronic variants (approximately 1,500,000 variants per individual) represents a technical challenge to researchers. Thus, we developed a Pathogenicity predictor for Deep-Intronic Variants causing Aberrant Splicing (PDIVAS) to easily detect pathogenic deep-intronic variants. RESULTS PDIVAS was trained on an ensemble machine-learning algorithm to classify pathogenic and benign variants in a curated dataset. The dataset consists of manually curated pathogenic splice-altering variants (SAVs) and commonly observed benign variants within deep introns. Splicing features and a splicing constraint metric were used to maximize the predictive sensitivity and specificity, respectively. PDIVAS showed an average precision of 0.92 and a maximum MCC of 0.88 in classifying these variants, which were the best of the previous predictors. When PDIVAS was applied to genome sequencing analysis on a threshold with 95% sensitivity for reported pathogenic SAVs, an average of 27 pathogenic candidates were extracted per individual. Furthermore, the causative variants in simulated patient genomes were more efficiently prioritized than the previous predictors. CONCLUSION Incorporating PDIVAS into variant interpretation pipelines will enable efficient detection of disease-causing deep-intronic SAVs and contribute to improving the diagnostic yield. PDIVAS is publicly available at https://github.com/shiro-kur/PDIVAS .
Collapse
Affiliation(s)
- Ryo Kurosawa
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan.
| | - Kei Iida
- Faculty of Science and Engineering, Kindai University, 3-4-1 Kowakae, Higashi-osaka, Osaka, 577-8502, Japan
- Medical Research Support Center, Graduate School of Medicine, Kyoto University, Yoshida- Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Masahiko Ajiro
- Division of Cancer RNA Research, National Cancer Center Research Institute, Tokyo, 104- 0045, Japan
- Department of Drug Discovery Medicine, Graduate School of Medicine, Kyoto University, Yoshida Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Tomonari Awaya
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
- Laboratory of Tumor Microenvironment and Immunity, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Mamiko Yamada
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Masatoshi Hagiwara
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan.
| |
Collapse
|
38
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. PLoS Comput Biol 2023; 19:e1011526. [PMID: 37824580 PMCID: PMC10597526 DOI: 10.1371/journal.pcbi.1011526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/24/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
39
|
Wang R, Helbig I, Edmondson AC, Lin L, Xing Y. Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform 2023; 24:bbad284. [PMID: 37580177 PMCID: PMC10516351 DOI: 10.1093/bib/bbad284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 08/16/2023] Open
Abstract
Genomic variants affecting pre-messenger RNA splicing and its regulation are known to underlie many rare genetic diseases. However, common workflows for genetic diagnosis and clinical variant interpretation frequently overlook splice-altering variants. To better serve patient populations and advance biomedical knowledge, it has become increasingly important to develop and refine approaches for detecting and interpreting pathogenic splicing variants. In this review, we will summarize a few recent developments and challenges in using RNA sequencing technologies for rare disease investigation. Moreover, we will discuss how recent computational splicing prediction tools have emerged as complementary approaches for revealing disease-causing variants underlying splicing defects. We speculate that continuous improvements to sequencing technologies and predictive modeling will not only expand our understanding of splicing regulation but also bring us closer to filling the diagnostic gap for rare disease patients.
Collapse
Affiliation(s)
- Robert Wang
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ingo Helbig
- The Epilepsy NeuroGenetics Initiative, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew C Edmondson
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Lan Lin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
40
|
O'Neill MJ, Yang T, Laudeman J, Calandranis M, Solus J, Roden DM, Glazer AM. ParSE-seq: A Calibrated Multiplexed Assay to Facilitate the Clinical Classification of Putative Splice-altering Variants. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.04.23295019. [PMID: 37732247 PMCID: PMC10508793 DOI: 10.1101/2023.09.04.23295019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
Background Interpreting the clinical significance of putative splice-altering variants outside 2-base pair canonical splice sites remains difficult without functional studies. Methods We developed Parallel Splice Effect Sequencing (ParSE-seq), a multiplexed minigene-based assay, to test variant effects on RNA splicing quantified by high-throughput sequencing. We studied variants in SCN5A, an arrhythmia-associated gene which encodes the major cardiac voltage-gated sodium channel. We used the computational tool SpliceAI to prioritize exonic and intronic candidate splice variants, and ClinVar to select benign and pathogenic control variants. We generated a pool of 284 barcoded minigene plasmids, transfected them into Human Embryonic Kidney (HEK293) cells and induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs), sequenced the resulting pools of splicing products, and calibrated the assay to the American College of Medical Genetics and Genomics scheme. Variants were interpreted using the calibrated functional data, and experimental data were compared to SpliceAI predictions. We further studied some splice-altering missense variants by cDNA-based automated patch clamping (APC) in HEK cells and assessed splicing and sodium channel function in CRISPR-edited iPSC-CMs. Results ParSE-seq revealed the splicing effect of 224 SCN5A variants in iPSC-CMs and 244 variants in HEK293 cells. The scores between the cell types were highly correlated (R2=0.84). In iPSCs, the assay had concordant scores for 21/22 benign/likely benign and 24/25 pathogenic/likely pathogenic control variants from ClinVar. 43/112 exonic variants and 35/70 intronic variants with determinate scores disrupted splicing. 11 of 42 variants of uncertain significance were reclassified, and 29 of 34 variants with conflicting interpretations were reclassified using the functional data. SpliceAI computational predictions correlated well with experimental data (AUC = 0.96). We identified 20 unique SCN5A missense variants that disrupted splicing, and 2 clinically observed splice-altering missense variants of uncertain significance had normal function when tested with the cDNA-based APC assay. A splice-altering intronic variant detected by ParSE-seq, c.1891-5C>G, also disrupted splicing and sodium current when introduced into iPSC-CMs at the endogenous locus by CRISPR editing. Conclusions ParSE-seq is a calibrated, multiplexed, high-throughput assay to facilitate the classification of candidate splice-altering variants.
Collapse
Affiliation(s)
| | - Tao Yang
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Julie Laudeman
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Maria Calandranis
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Joseph Solus
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Dan M Roden
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Departments of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Andrew M Glazer
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
41
|
Kerimov N, Tambets R, Hayhurst JD, Rahu I, Kolberg P, Raudvere U, Kuzmin I, Chowdhary A, Vija A, Teras HJ, Kanai M, Ulirsch J, Ryten M, Hardy J, Guelfi S, Trabzuni D, Kim-Hellmuth S, Rayner W, Finucane H, Peterson H, Mosaku A, Parkinson H, Alasoo K. eQTL Catalogue 2023: New datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs. PLoS Genet 2023; 19:e1010932. [PMID: 37721944 PMCID: PMC10538656 DOI: 10.1371/journal.pgen.1010932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 09/28/2023] [Accepted: 08/22/2023] [Indexed: 09/20/2023] Open
Abstract
The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ralf Tambets
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - James D. Hayhurst
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ida Rahu
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Peep Kolberg
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Uku Raudvere
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Anshika Chowdhary
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Andreas Vija
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Hans J. Teras
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jacob Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - John Hardy
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sebastian Guelfi
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Daniah Trabzuni
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sarah Kim-Hellmuth
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
- Department of Pediatrics, Dr. von Hauner Children’s Hospital, University Hospital LMU Munich, Munich, Germany
| | - William Rayner
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Hilary Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Abayomi Mosaku
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Helen Parkinson
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
42
|
Aradhya S, Facio FM, Metz H, Manders T, Colavin A, Kobayashi Y, Nykamp K, Johnson B, Nussbaum RL. Applications of artificial intelligence in clinical laboratory genomics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32057. [PMID: 37507620 DOI: 10.1002/ajmg.c.32057] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
The transition from analog to digital technologies in clinical laboratory genomics is ushering in an era of "big data" in ways that will exceed human capacity to rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular data to facilitate timely diagnosis and management of genomic disorders will require supportive artificial intelligence methods. These are already being introduced into clinical laboratory genomics to identify variants in DNA sequencing data, predict the effects of DNA variants on protein structure and function to inform clinical interpretation of pathogenicity, link phenotype ontologies to genetic variants identified through exome or genome sequencing to help clinicians reach diagnostic answers faster, correlate genomic data with tumor staging and treatment approaches, utilize natural language processing to identify critical published medical literature during analysis of genomic data, and use interactive chatbots to identify individuals who qualify for genetic testing or to provide pre-test and post-test education. With careful and ethical development and validation of artificial intelligence for clinical laboratory genomics, these advances are expected to significantly enhance the abilities of geneticists to translate complex data into clearly synthesized information for clinicians to use in managing the care of their patients at scale.
Collapse
Affiliation(s)
- Swaroop Aradhya
- Invitae Corporation, San Francisco, California, USA
- Adjunct Clinical Faculty, Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| | | | - Hillery Metz
- Invitae Corporation, San Francisco, California, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, California, USA
| | | | | | - Keith Nykamp
- Invitae Corporation, San Francisco, California, USA
| | | | - Robert L Nussbaum
- Invitae Corporation, San Francisco, California, USA
- Volunteer Faculty, School of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
43
|
Brand CM, Colbran LL, Capra JA. Resurrecting the alternative splicing landscape of archaic hominins using machine learning. Nat Ecol Evol 2023; 7:939-953. [PMID: 37142741 PMCID: PMC11440953 DOI: 10.1038/s41559-023-02053-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 03/29/2023] [Indexed: 05/06/2023]
Abstract
Alternative splicing contributes to adaptation and divergence in many species. However, it has not been possible to directly compare splicing between modern and archaic hominins. Here, we unmask the recent evolution of this previously unobservable regulatory mechanism by applying SpliceAI, a machine-learning algorithm that identifies splice-altering variants (SAVs), to high-coverage genomes from three Neanderthals and a Denisovan. We discover 5,950 putative archaic SAVs, of which 2,186 are archaic-specific and 3,607 also occur in modern humans via introgression (244) or shared ancestry (3,520). Archaic-specific SAVs are enriched in genes that contribute to traits potentially relevant to hominin phenotypic divergence, such as the epidermis, respiration and spinal rigidity. Compared to shared SAVs, archaic-specific SAVs occur in sites under weaker selection and are more common in genes with tissue-specific expression. Further underscoring the importance of negative selection on SAVs, Neanderthal lineages with low effective population sizes are enriched for SAVs compared to Denisovan and shared SAVs. Finally, we find that nearly all introgressed SAVs in humans were shared across the three Neanderthals, suggesting that older SAVs were more tolerated in human genomes. Our results reveal the splicing landscape of archaic hominins and identify potential contributions of splicing to phenotypic differences among hominins.
Collapse
Affiliation(s)
- Colin M Brand
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - Laura L Colbran
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - John A Capra
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA.
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA.
| |
Collapse
|
44
|
Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, Meyerson M, Evans BJ, Fairbrother WG. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A 2023; 120:e2218308120. [PMID: 37192163 PMCID: PMC10214146 DOI: 10.1073/pnas.2218308120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/12/2023] [Indexed: 05/18/2023] Open
Abstract
Humans coexisted and interbred with other hominins which later became extinct. These archaic hominins are known to us only through fossil records and for two cases, genome sequences. Here, we engineer Neanderthal and Denisovan sequences into thousands of artificial genes to reconstruct the pre-mRNA processing patterns of these extinct populations. Of the 5,169 alleles tested in this massively parallel splicing reporter assay (MaPSy), we report 962 exonic splicing mutations that correspond to differences in exon recognition between extant and extinct hominins. Using MaPSy splicing variants, predicted splicing variants, and splicing quantitative trait loci, we show that splice-disrupting variants experienced greater purifying selection in anatomically modern humans than that in Neanderthals. Adaptively introgressed variants were enriched for moderate-effect splicing variants, consistent with positive selection for alternative spliced alleles following introgression. As particularly compelling examples, we characterized a unique tissue-specific alternative splicing variant at the adaptively introgressed innate immunity gene TLR1, as well as a unique Neanderthal introgressed alternative splicing variant in the gene HSPG2 that encodes perlecan. We further identified potentially pathogenic splicing variants found only in Neanderthals and Denisovans in genes related to sperm maturation and immunity. Finally, we found splicing variants that may contribute to variation among modern humans in total bilirubin, balding, hemoglobin levels, and lung capacity. Our findings provide unique insights into natural selection acting on splicing in human evolution and demonstrate how functional assays can be used to identify candidate causal variants underlying differences in gene regulation and phenotype.
Collapse
Affiliation(s)
- Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Christopher R. Neil
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Anastasia Welch
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Chaorui Duan
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Samantha Maguire
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ijeoma C. Meremikwu
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Malcolm Meyerson
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ben J. Evans
- Department of Biology, McMaster University, Hamilton, ONL8S 4K1, Canada
| | - William G. Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
- Hassenfeld Child Health Innovation Institute of Brown University, Providence, RI02912
| |
Collapse
|
45
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539398. [PMID: 37205456 PMCID: PMC10187268 DOI: 10.1101/2023.05.04.539398] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
46
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535488. [PMID: 37066250 PMCID: PMC10104019 DOI: 10.1101/2023.04.03.535488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
47
|
Kerimov N, Tambets R, Hayhurst JD, Rahu I, Kolberg P, Raudvere U, Kuzmin I, Chowdhary A, Vija A, Teras HJ, Kanai M, Ulirsch J, Ryten M, Hardy J, Guelfi S, Trabzuni D, Kim-Hellmuth S, Rayner W, Finucane H, Peterson H, Mosaku A, Parkinson H, Alasoo K. Systematic visualisation of molecular QTLs reveals variant mechanisms at GWAS loci. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535816. [PMID: 37066341 PMCID: PMC10104061 DOI: 10.1101/2023.04.06.535816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Splicing quantitative trait loci (QTLs) have been implicated as a common mechanism underlying complex trait associations. However, utilising splicing QTLs in target discovery and prioritisation has been challenging due to extensive data normalisation which often renders the direction of the genetic effect as well as its magnitude difficult to interpret. This is further complicated by the fact that strong expression QTLs often manifest as weak splicing QTLs and vice versa, making it difficult to uniquely identify the underlying molecular mechanism at each locus. We find that these ambiguities can be mitigated by visualising the association between the genotype and average RNA sequencing read coverage in the region. Here, we generate these QTL coverage plots for 1.7 million molecular QTL associations in the eQTL Catalogue identified with five quantification methods. We illustrate the utility of these QTL coverage plots by performing colocalisation between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. We find that while visually confirmed splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases. All our association summary statistics and QTL coverage plots are freely available at https://www.ebi.ac.uk/eqtl/.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ralf Tambets
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - James D Hayhurst
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ida Rahu
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Peep Kolberg
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Uku Raudvere
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Anshika Chowdhary
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Andreas Vija
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Hans J Teras
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jacob Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - John Hardy
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Sebastian Guelfi
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Daniah Trabzuni
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Sarah Kim-Hellmuth
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital LMU Munich, Munich, Germany
| | - Will Rayner
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Hilary Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Abayomi Mosaku
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Parkinson
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
48
|
Rogalska ME, Vivori C, Valcárcel J. Regulation of pre-mRNA splicing: roles in physiology and disease, and therapeutic prospects. Nat Rev Genet 2023; 24:251-269. [PMID: 36526860 DOI: 10.1038/s41576-022-00556-8] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2022] [Indexed: 12/23/2022]
Abstract
The removal of introns from mRNA precursors and its regulation by alternative splicing are key for eukaryotic gene expression and cellular function, as evidenced by the numerous pathologies induced or modified by splicing alterations. Major recent advances have been made in understanding the structures and functions of the splicing machinery, in the description and classification of physiological and pathological isoforms and in the development of the first therapies for genetic diseases based on modulation of splicing. Here, we review this progress and discuss important remaining challenges, including predicting splice sites from genomic sequences, understanding the variety of molecular mechanisms and logic of splicing regulation, and harnessing this knowledge for probing gene function and disease aetiology and for the design of novel therapeutic approaches.
Collapse
Affiliation(s)
- Malgorzata Ewa Rogalska
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Claudia Vivori
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- The Francis Crick Institute, London, UK
| | - Juan Valcárcel
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
49
|
Ullah F, Jabeen S, Salton M, Reddy ASN, Ben-Hur A. Evidence for the role of transcription factors in the co-transcriptional regulation of intron retention. Genome Biol 2023; 24:53. [PMID: 36949544 PMCID: PMC10031921 DOI: 10.1186/s13059-023-02885-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 02/16/2023] [Indexed: 03/24/2023] Open
Abstract
BACKGROUND Alternative splicing is a widespread regulatory phenomenon that enables a single gene to produce multiple transcripts. Among the different types of alternative splicing, intron retention is one of the least explored despite its high prevalence in both plants and animals. The recent discovery that the majority of splicing is co-transcriptional has led to the finding that chromatin state affects alternative splicing. Therefore, it is plausible that transcription factors can regulate splicing outcomes. RESULTS We provide evidence for the hypothesis that transcription factors are involved in the regulation of intron retention by studying regions of open chromatin in retained and excised introns. Using deep learning models designed to distinguish between regions of open chromatin in retained introns and non-retained introns, we identified motifs enriched in IR events with significant hits to known human transcription factors. Our model predicts that the majority of transcription factors that affect intron retention come from the zinc finger family. We demonstrate the validity of these predictions using ChIP-seq data for multiple zinc finger transcription factors and find strong over-representation for their peaks in intron retention events. CONCLUSIONS This work opens up opportunities for further studies that elucidate the mechanisms by which transcription factors affect intron retention and other forms of splicing. AVAILABILITY Source code available at https://github.com/fahadahaf/chromir.
Collapse
Affiliation(s)
- Fahad Ullah
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA
| | - Saira Jabeen
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA
| | - Maayan Salton
- Department of Biology, Colorado State University, Fort Collins, CO, USA
| | - Anireddy S N Reddy
- Biochemistry and Molecular Biology Department, The Hebrew University Faculty of Medicine, Jerusalem, Israel
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
50
|
A deep intronic TCTN2 variant activating a cryptic exon predicted by SpliceRover in a patient with Joubert syndrome. J Hum Genet 2023:10.1038/s10038-023-01143-3. [PMID: 36894704 DOI: 10.1038/s10038-023-01143-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/26/2023] [Accepted: 02/27/2023] [Indexed: 03/11/2023]
Abstract
The recent introduction of genome sequencing in genetic analysis has led to the identification of pathogenic variants located in deep introns. Recently, several new tools have emerged to predict the impact of variants on splicing. Here, we present a Japanese boy of Joubert syndrome with biallelic TCTN2 variants. Exome sequencing identified only a heterozygous maternal nonsense TCTN2 variant (NM_024809.5:c.916C >T, p.(Gln306Ter)). Subsequent genome sequencing identified a deep intronic variant (c.1033+423G>A) inherited from his father. The machine learning algorithms SpliceAI, Squirls, and Pangolin were unable to predict alterations in splicing by the c.1033+423G>A variant. SpliceRover, a tool for splice site prediction using FASTA sequence, was able to detect a cryptic exon which was 85-bp away from the variant and within the inverted Alu sequence while SpliceRover scores for these splice sites showed slight increase (donor) or decrease (acceptor) between the reference and mutant sequences. RNA sequencing and RT-PCR using urinary cells confirmed inclusion of the cryptic exon. The patient showed major symptoms of TCTN2-related disorders such as developmental delay, dysmorphic facial features and polydactyly. He also showed uncommon features such as retinal dystrophy, exotropia, abnormal pattern of respiration, and periventricular heterotopia, confirming these as one of features of TCTN2-related disorders. Our study highlights usefulness of genome sequencing and RNA sequencing using urinary cells for molecular diagnosis of genetic disorders and suggests that database of cryptic splice sites predicted in introns by SpliceRover using the reference sequences can be helpful in extracting candidate variants from large numbers of intronic variants in genome sequencing.
Collapse
|