101
|
Martin-Geary AC, Blakes AJM, Dawes R, Findlay SD, Lord J, Walker S, Talbot-Martin J, Wieder N, D’Souza EN, Fernandes M, Hilton S, Lahiri N, Campbell C, Jenkinson S, DeGoede CGEL, Anderson ER, Burge CB, Sanders SJ, Ellingford J, Baralle D, Banka S, Whiffin N. Systematic identification of disease-causing promoter and untranslated region variants in 8,040 undiagnosed individuals with rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.12.23295416. [PMID: 37745552 PMCID: PMC10516070 DOI: 10.1101/2023.09.12.23295416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Background Both promoters and untranslated regions (UTRs) have critical regulatory roles, yet variants in these regions are largely excluded from clinical genetic testing due to difficulty in interpreting pathogenicity. The extent to which these regions may harbour diagnoses for individuals with rare disease is currently unknown. Methods We present a framework for the identification and annotation of potentially deleterious proximal promoter and UTR variants in known dominant disease genes. We use this framework to annotate de novo variants (DNVs) in 8,040 undiagnosed individuals in the Genomics England 100,000 genomes project, which were subject to strict region-based filtering, clinical review, and validation studies where possible. In addition, we performed region and variant annotation-based burden testing in 7,862 unrelated probands against matched unaffected controls. Results We prioritised eleven DNVs and identified an additional variant overlapping one of the eleven. Ten of these twelve variants (82%) are in genes that are a strong match to the individual's phenotype and six had not previously been identified. Through burden testing, we did not observe a significant enrichment of potentially deleterious promoter and/or UTR variants in individuals with rare disease collectively across any of our region or variant annotations. Conclusions Overall, we demonstrate the value of screening promoters and UTRs to uncover additional diagnoses for previously undiagnosed individuals with rare disease and provide a framework for doing so without dramatically increasing interpretation burden.
Collapse
Affiliation(s)
- Alexandra C Martin-Geary
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Alexander J M Blakes
- Manchester Centre for Genomic Medicine, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Ruebena Dawes
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Scott D Findlay
- Department of Biology, Massachusetts Institute of Technology, Cambridge, USA
| | | | | | | | - Nechama Wieder
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Elston N D’Souza
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Maria Fernandes
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Sarah Hilton
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Nayana Lahiri
- St George’s, University of London & St George’s University Hospitals NHS Foundation Trust, Institute of Molecular and Clinical Sciences, London, SW17 0QT, UK
| | - Christopher Campbell
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Sarah Jenkinson
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Christian G E L DeGoede
- Department of Paediatric Neurology, Clinical research Facility, Lancashire Teaching Hospitals NHS Trust
- Manchester Metropolitan University
| | - Emily R Anderson
- Liverpool Centre for Genomic Medicine, Liverpool Women’s Hospital, Liverpool, UK
| | | | - Stephan J Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- New York Genome Center, New York, NY, USA
| | - Jamie Ellingford
- Manchester Centre for Genomic Medicine, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Diana Baralle
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Siddharth Banka
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Nicola Whiffin
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
102
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.08.23295253. [PMID: 37732177 PMCID: PMC10508837 DOI: 10.1101/2023.09.08.23295253] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I. Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, QC H1T 1C8, Canada
- Faculté de Médecine, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A. Cassa
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard I. Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
103
|
Bohn E, Lau TTY, Wagih O, Masud T, Merico D. A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction. Front Mol Biosci 2023; 10:1257550. [PMID: 37745687 PMCID: PMC10517338 DOI: 10.3389/fmolb.2023.1257550] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Variants in 5' and 3' untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects. Methods: 3' and 5' UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants. Results: 295 3' and 188 5' UTR variants were obtained from ClinVar, of which 26 3' and 68 5' UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3' and 5' UTR. Discussion: In conclusion, we present a high-confidence set of P/LP 3' and 5' UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease.
Collapse
Affiliation(s)
- Emma Bohn
- Deep Genomics Inc., Toronto, ON, Canada
| | | | | | | | - Daniele Merico
- Deep Genomics Inc., Toronto, ON, Canada
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, ON, Canada
| |
Collapse
|
104
|
Kerimov N, Tambets R, Hayhurst JD, Rahu I, Kolberg P, Raudvere U, Kuzmin I, Chowdhary A, Vija A, Teras HJ, Kanai M, Ulirsch J, Ryten M, Hardy J, Guelfi S, Trabzuni D, Kim-Hellmuth S, Rayner W, Finucane H, Peterson H, Mosaku A, Parkinson H, Alasoo K. eQTL Catalogue 2023: New datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs. PLoS Genet 2023; 19:e1010932. [PMID: 37721944 PMCID: PMC10538656 DOI: 10.1371/journal.pgen.1010932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 09/28/2023] [Accepted: 08/22/2023] [Indexed: 09/20/2023] Open
Abstract
The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ralf Tambets
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - James D. Hayhurst
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ida Rahu
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Peep Kolberg
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Uku Raudvere
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Anshika Chowdhary
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Andreas Vija
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Hans J. Teras
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jacob Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - John Hardy
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sebastian Guelfi
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Daniah Trabzuni
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sarah Kim-Hellmuth
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
- Department of Pediatrics, Dr. von Hauner Children’s Hospital, University Hospital LMU Munich, Munich, Germany
| | - William Rayner
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Hilary Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Abayomi Mosaku
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Helen Parkinson
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
105
|
Hayesmoore JB, Bhuiyan ZA, Coviello DA, du Sart D, Edwards M, Iascone M, Morris-Rosendahl DJ, Sheils K, van Slegtenhorst M, Thomson KL. EMQN: Recommendations for genetic testing in inherited cardiomyopathies and arrhythmias. Eur J Hum Genet 2023; 31:1003-1009. [PMID: 37443332 PMCID: PMC10474043 DOI: 10.1038/s41431-023-01421-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/15/2023] Open
Abstract
Inherited cardiomyopathies and arrhythmias (ICAs) are a prevalent and clinically heterogeneous group of genetic disorders that are associated with increased risk of sudden cardiac death and heart failure. Making a genetic diagnosis can inform the management of patients and their at-risk relatives and, as such, molecular genetic testing is now considered an integral component of the clinical care pathway. However, ICAs are characterised by high genetic and allelic heterogeneity, incomplete / age-related penetrance, and variable expressivity. Therefore, despite our improved understanding of the genetic basis of these conditions, and significant technological advances over the past two decades, identifying and recognising the causative genotype remains challenging. As clinical genetic testing for ICAs becomes more widely available, it is increasingly important for clinical laboratories to consolidate existing knowledge and experience to inform and improve future practice. These recommendations have been compiled to help clinical laboratories navigate the challenges of ICAs and thereby facilitate best practice and consistency in genetic test provision for this group of disorders. General recommendations on internal and external quality control, referral, analysis, result interpretation, and reporting are described. Also included are appendices that provide specific information pertinent to genetic testing for hypertrophic, dilated, and arrhythmogenic right ventricular cardiomyopathies, long QT syndrome, Brugada syndrome, and catecholaminergic polymorphic ventricular tachycardia.
Collapse
Affiliation(s)
- Jesse B Hayesmoore
- Oxford Regional Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Zahurul A Bhuiyan
- Division of Genetic Medicine, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | | | - Desirée du Sart
- Biological Sciences and Genomics, Monash University, Melbourne, VIC, Australia
| | - Matthew Edwards
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | - Maria Iascone
- Laboratorio di Genetica Medica, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Deborah J Morris-Rosendahl
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | | | | | - Kate L Thomson
- Oxford Regional Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| |
Collapse
|
106
|
Lee H, Greer SU, Pavlichin DS, Zhou B, Urban AE, Weissman T, Ji HP. Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. CELL REPORTS METHODS 2023; 3:100543. [PMID: 37671027 PMCID: PMC10475782 DOI: 10.1016/j.crmeth.2023.100543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 04/14/2023] [Accepted: 07/06/2023] [Indexed: 09/07/2023]
Abstract
The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.
Collapse
Affiliation(s)
- HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stephanie U. Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dmitri S. Pavlichin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Alexander E. Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| |
Collapse
|
107
|
Korbecki J, Bosiacki M, Chlubek D, Baranowska-Bosiacka I. Bioinformatic Analysis of the CXCR2 Ligands in Cancer Processes. Int J Mol Sci 2023; 24:13287. [PMID: 37686093 PMCID: PMC10487711 DOI: 10.3390/ijms241713287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
Human CXCR2 has seven ligands, i.e., CXCL1, CXCL2, CXCL3, CXCL5, CXCL6, CXCL7, and CXCL8/IL-8-chemokines with nearly identical properties. However, no available study has compared the contribution of all CXCR2 ligands to cancer progression. That is why, in this study, we conducted a bioinformatic analysis using the GEPIA, UALCAN, and TIMER2.0 databases to investigate the role of CXCR2 ligands in 31 different types of cancer, including glioblastoma, melanoma, and colon, esophageal, gastric, kidney, liver, lung, ovarian, pancreatic, and prostate cancer. We focused on the differences in the regulation of expression (using the Tfsitescan and miRDB databases) and analyzed mutation types in CXCR2 ligand genes in cancers (using the cBioPortal). The data showed that the effect of CXCR2 ligands on prognosis depends on the type of cancer. CXCR2 ligands were associated with EMT, angiogenesis, recruiting neutrophils to the tumor microenvironment, and the count of M1 macrophages. The regulation of the expression of each CXCR2 ligand was different and, thus, each analyzed chemokine may have a different function in cancer processes. Our findings suggest that each type of cancer has a unique pattern of CXCR2 ligand involvement in cancer progression, with each ligand having a unique regulation of expression.
Collapse
Affiliation(s)
- Jan Korbecki
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
- Department of Anatomy and Histology, Collegium Medicum, University of Zielona Góra, Zyty 28 St., 65-046 Zielona Góra, Poland
| | - Mateusz Bosiacki
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
- Department of Functional Diagnostics and Physical Medicine, Faculty of Health Sciences, Pomeranian Medical University in Szczecin, Żołnierska Str. 54, 71-210 Szczecin, Poland
| | - Dariusz Chlubek
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
| | - Irena Baranowska-Bosiacka
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
| |
Collapse
|
108
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
109
|
Foreman J, Perrett D, Mazaika E, Hunt SE, Ware JS, Firth HV. DECIPHER: Improving Genetic Diagnosis Through Dynamic Integration of Genomic and Clinical Data. Annu Rev Genomics Hum Genet 2023; 24:151-176. [PMID: 37285546 PMCID: PMC7615097 DOI: 10.1146/annurev-genom-102822-100509] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
DECIPHER (Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources) shares candidate diagnostic variants and phenotypic data from patients with genetic disorders to facilitate research and improve the diagnosis, management, and therapy of rare diseases. The platform sits at the boundary between genomic research and the clinical community. DECIPHER aims to ensure that the most up-to-date data are made rapidly available within its interpretation interfaces to improve clinical care. Newly integrated cardiac case-control data that provide evidence of gene-disease associations and inform variant interpretation exemplify this mission. New research resources are presented in a format optimized for use by a broad range of professionals supporting the delivery of genomic medicine. The interfaces within DECIPHER integrate and contextualize variant and phenotypic data, helping to determine a robust clinico-molecular diagnosis for rare-disease patients, which combines both variant classification and clinical fit. DECIPHER supports discovery research, connecting individuals within the rare-disease community to pursue hypothesis-driven research.
Collapse
Affiliation(s)
- Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Daniel Perrett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Erica Mazaika
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
| | - James S Ware
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
- Royal Brompton and Harefield Hospitals, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, United Kingdom
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom;
| |
Collapse
|
110
|
Brovkina MV, Chapman MA, Holding ML, Clowney EJ. Emergence and influence of sequence bias in evolutionarily malleable, mammalian tandem arrays. BMC Biol 2023; 21:179. [PMID: 37612705 PMCID: PMC10463633 DOI: 10.1186/s12915-023-01673-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/01/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND The radiation of mammals at the extinction of the dinosaurs produced a plethora of new forms-as diverse as bats, dolphins, and elephants-in only 10-20 million years. Behind the scenes, adaptation to new niches is accompanied by extensive innovation in large families of genes that allow animals to contact the environment, including chemosensors, xenobiotic enzymes, and immune and barrier proteins. Genes in these "outward-looking" families are allelically diverse among humans and exhibit tissue-specific and sometimes stochastic expression. RESULTS Here, we show that these tandem arrays of outward-looking genes occupy AT-biased isochores and comprise the "tissue-specific" gene class that lack CpG islands in their promoters. Models of mammalian genome evolution have not incorporated the sharply different functions and transcriptional patterns of genes in AT- versus GC-biased regions. To examine the relationship between gene family expansion, sequence content, and allelic diversity, we use population genetic data and comparative analysis. First, we find that AT bias can emerge during evolutionary expansion of gene families in cis. Second, human genes in AT-biased isochores or with GC-poor promoters experience relatively low rates of de novo point mutation today but are enriched for non-synonymous variants. Finally, we find that isochores containing gene clusters exhibit low rates of recombination. CONCLUSIONS Our analyses suggest that tolerance of non-synonymous variation and low recombination are two forces that have produced the depletion of GC bases in outward-facing gene arrays. In turn, high AT content exerts a profound effect on their chromatin organization and transcriptional regulation.
Collapse
Affiliation(s)
- Margarita V Brovkina
- Graduate Program in Cellular and Molecular Biology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Margaret A Chapman
- Neurosciences Graduate Program, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - E Josephine Clowney
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA.
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
111
|
Abstract
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Collapse
Affiliation(s)
- Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Catalonia
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia
| |
Collapse
|
112
|
Wang L. Reference-guided search for open reading frames. NATURE COMPUTATIONAL SCIENCE 2023; 3:667-668. [PMID: 38177317 DOI: 10.1038/s43588-023-00497-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Liguo Wang
- Division of Computational Biology, Mayo Clinic College of Medicine and Science, Rochester, MN, USA.
| |
Collapse
|
113
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
114
|
Varabyou A, Erdogdu B, Salzberg SL, Pertea M. Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage. NATURE COMPUTATIONAL SCIENCE 2023; 3:700-708. [PMID: 38098813 PMCID: PMC10718564 DOI: 10.1038/s43588-023-00496-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/05/2023] [Indexed: 12/17/2023]
Abstract
ORFanage is a system designed to assign open reading frames (ORFs) to known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Steven L. Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
115
|
Chao KH, Mao A, Salzberg SL, Pertea M. Splam: a deep-learning-based splice site predictor that improves spliced alignments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550754. [PMID: 37546880 PMCID: PMC10402160 DOI: 10.1101/2023.07.27.550754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relatively limited window of 400 base pairs flanking each splice site, motivated by the observation that the biological process of splicing relies primarily on signals within this window. Additionally, Splam introduces the idea of training the network on donor and acceptor pairs together, based on the principle that the splicing machinery recognizes both ends of each intron at once. We compare Splam's accuracy to recent state-of-the-art splice site prediction methods, particularly SpliceAI, another method that uses deep neural networks. Our results show that Splam is consistently more accurate than SpliceAI, with an overall accuracy of 96% at predicting human splice junctions. Splam generalizes even to non-human species, including distant ones like the flowering plant Arabidopsis thaliana. Finally, we demonstrate the use of Splam on a novel application: processing the spliced alignments of RNA-seq data to identify and eliminate errors. We show that when used in this manner, Splam yields substantial improvements in the accuracy of downstream transcriptome analysis of both poly(A) and ribo-depleted RNA-seq libraries. Overall, Splam offers a faster and more accurate approach to detecting splice junctions, while also providing a reliable and efficient solution for cleaning up erroneous spliced alignments.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alan Mao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
116
|
Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G, Behera AK, Gonzalez JM, Hunt T, Lagarde J, Liang CE, Li H, Jerryd Meade M, Moraga Amador DA, Prjibelski AD, Birol I, Bostan H, Brooks AM, Hasan Çelik M, Chen Y, Du MR, Felton C, Göke J, Hafezqorani S, Herwig R, Kawaji H, Lee J, Liang Li J, Lienhard M, Mikheenko A, Mulligan D, Ming Nip K, Pertea M, Ritchie ME, Sim AD, Tang AD, Kei Wan Y, Wang C, Wong BY, Yang C, Barnes I, Berry A, Capella S, Dhillon N, Fernandez-Gonzalez JM, Ferrández-Peral L, Garcia-Reyero N, Goetz S, Hernández-Ferrer C, Kondratova L, Liu T, Martinez-Martin A, Menor C, Mestre-Tomás J, Mudge JM, Panayotova NG, Paniagua A, Repchevsky D, Rouchka E, Saint-John B, Sapena E, Sheynkman L, Laird Smith M, Suner MM, Takahashi H, Youngworth IA, Carninci P, Denslow ND, Guigó R, Hunter ME, Tilgner HU, Wold BJ, Vollmers C, Frankish A, Fai Au K, Sheynkman GM, Mortazavi A, Conesa A, Brooks AN. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550582. [PMID: 37546854 PMCID: PMC10402094 DOI: 10.1101/2023.07.25.550582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
Collapse
Affiliation(s)
- Francisco J. Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
- These authors contributed equally to this work
| | - Dingjie Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
- These authors contributed equally to this work
| | - Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- These authors contributed equally to this work
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- These authors contributed equally to this work
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- These authors contributed equally to this work
| | - Jane E. Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Maite De María
- Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA
- Center for Environmental and Human Toxicology, University of Florida, Gainesville, USA
- These authors contributed equally to this work
| | - Matthew S. Adams
- Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- These authors contributed equally to this work
| | - Amit K. Behera
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Jose M. Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Flomics Biotech, Dr Aiguader 88, Barcelona 08003, Spain
- These authors contributed equally to this work
| | - Cindy E. Liang
- Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Haoran Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
- These authors contributed equally to this work
| | - Marcus Jerryd Meade
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
- These authors contributed equally to this work
| | - David A. Moraga Amador
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
- These authors contributed equally to this work
| | - Andrey D. Prjibelski
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
- These authors contributed equally to this work
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Hamed Bostan
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Ashley M. Brooks
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Muhammed Hasan Çelik
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Ying Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Mei R,M. Du
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Colette Felton
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Jonathan Göke
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Ralf Herwig
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Hideya Kawaji
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Joseph Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Jian Liang Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Matthias Lienhard
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Alla Mikheenko
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Dennis Mulligan
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Mihaela Pertea
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Matthew E. Ritchie
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Andre D. Sim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Alison D. Tang
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Changqing Wang
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Brandon Y. Wong
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Namrita Dhillon
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | | | - Luis Ferrández-Peral
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Natàlia Garcia-Reyero
- Environmental Laboratory, US Army Engineer Research & Development Center, Vicksburg, USA
| | | | | | | | | | | | | | - Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Jonathan M. Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nedka G. Panayotova
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
| | - Alejandro Paniagua
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | | | - Eric Rouchka
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
| | - Brandon Saint-John
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Enrique Sapena
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, UK
| | - Leon Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
| | - Melissa Laird Smith
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Hazuki Takahashi
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
| | | | - Piero Carninci
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
- Human Technopole, Milano, Italy
| | - Nancy D. Denslow
- Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA
- Center for Environmental and Human Toxicology, Department of Physiological Sciences,, University of Florida, Gainesville, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Margaret E. Hunter
- U.S. Geological Survey, Wetland and Aquatic Research Center, Gainesville, USA
| | - Hagen U. Tilgner
- Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, USA
| | - Barbara J. Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
| | - Gloria M. Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
- Center for Public Health Genomics
- UVA Cancer Center, University of Virginia, Charlottesville, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
- Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, USA
| | - Angela N. Brooks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| |
Collapse
|
117
|
Sansbury SE, Serebrenik YV, Lapidot T, Burslem GM, Shalem O. Pooled tagging and hydrophobic targeting of endogenous proteins for unbiased mapping of unfolded protein responses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.13.548611. [PMID: 37503003 PMCID: PMC10370017 DOI: 10.1101/2023.07.13.548611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
System-level understanding of proteome organization and function requires methods for direct visualization and manipulation of proteins at scale. We developed an approach enabled by high-throughput gene tagging for the generation and analysis of complex cell pools with endogenously tagged proteins. Proteins are tagged with HaloTag to enable visualization or direct perturbation. Fluorescent labeling followed by in situ sequencing and deep learning-based image analysis identifies the localization pattern of each tag, providing a bird's-eye-view of cellular organization. Next, we use a hydrophobic HaloTag ligand to misfold tagged proteins, inducing spatially restricted proteotoxic stress that is read out by single cell RNA sequencing. By integrating optical and perturbation data, we map compartment-specific responses to protein misfolding, revealing inter-compartment organization and direct crosstalk, and assigning proteostasis functions to uncharacterized genes. Altogether, we present a powerful and efficient method for large-scale studies of proteome dynamics, function, and homeostasis.
Collapse
|
118
|
Sweatt AJ, Griffiths CD, Paudel BB, Janes KA. Proteome-wide copy-number estimation from transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548432. [PMID: 37503057 PMCID: PMC10369941 DOI: 10.1101/2023.07.10.548432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Protein copy numbers constrain systems-level properties of regulatory networks, but absolute proteomic data remain scarce compared to transcriptomics obtained by RNA sequencing. We addressed this persistent gap by relating mRNA to protein statistically using best-available data from quantitative proteomics-transcriptomics for 4366 genes in 369 cell lines. The approach starts with a central estimate of protein copy number and hierarchically appends mRNA-protein and mRNA-mRNA dependencies to define an optimal gene-specific model that links mRNAs to protein. For dozens of independent cell lines and primary prostate samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, and empirical protein-to-mRNA ratios. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein interaction complexes, suggesting mechanistic relationships are embedded. We use the method to estimate viral-receptor abundances of CD55-CXADR from human heart transcriptomes and build 1489 systems-biology models of coxsackievirus B3 infection susceptibility. When applied to 796 RNA sequencing profiles of breast cancer from The Cancer Genome Atlas, inferred copy-number estimates collectively reclassify 26% of Luminal A and 29% of Luminal B tumors. Protein-based reassignments strongly involve a pharmacologic target for luminal breast cancer (CDK4) and an α-catenin that is often undetectable at the mRNA level (CTTNA2). Thus, by adopting a gene-centered perspective of mRNA-protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility limits of contemporary proteomics. The collection of gene-specific models is assembled as a web tool for users seeking mRNA-guided predictions of absolute protein abundance (http://janeslab.shinyapps.io/Pinferna).
Collapse
Affiliation(s)
- Andrew J. Sweatt
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908
| | - Cameron D. Griffiths
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908
| | - B. Bishal Paudel
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908
| | - Kevin A. Janes
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908
- Department of Biochemistry & Molecular Genetics, University of Virginia, Charlottesville, VA, 22908
| |
Collapse
|
119
|
Walker LC, Hoya MDL, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A, Zimmermann H, Byrne AB, Pesaran T, Karam R, Harrison SM, Spurdle AB. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet 2023; 110:1046-1067. [PMID: 37352859 PMCID: PMC10357475 DOI: 10.1016/j.ajhg.2023.06.002] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/25/2023] Open
Abstract
The American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) framework for classifying variants uses six evidence categories related to the splicing potential of variants: PVS1, PS3, PP3, BS3, BP4, and BP7. However, the lack of guidance on how to apply such codes has contributed to variation in the specifications developed by different Clinical Genome Resource (ClinGen) Variant Curation Expert Panels. The ClinGen Sequence Variant Interpretation Splicing Subgroup was established to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. We utilized empirically derived splicing evidence to (1) determine the evidence weighting of splicing-related data and appropriate criteria code selection for general use, (2) outline a process for integrating splicing-related considerations when developing a gene-specific PVS1 decision tree, and (3) exemplify methodology to calibrate splice prediction tools. We propose repurposing the PVS1_Strength code to capture splicing assay data that provide experimental evidence for variants resulting in RNA transcript(s) with loss of function. Conversely, BP7 may be used to capture RNA results demonstrating no splicing impact for intronic and synonymous variants. We propose that the PS3/BS3 codes are applied only for well-established assays that measure functional impact not directly captured by RNA-splicing assays. We recommend the application of PS1 based on similarity of predicted RNA-splicing effects for a variant under assessment in comparison with a known pathogenic variant. The recommendations and approaches for consideration and evaluation of RNA-assay evidence described aim to help standardize variant pathogenicity classification processes when interpreting splicing-based evidence.
Collapse
Affiliation(s)
- Logan C Walker
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - Miguel de la Hoya
- Molecular Oncology Laboratory, CIBERONC, Hospital Clinico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - George A R Wiggins
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | | | | | - Michael T Parsons
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Daffodil M Canson
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | | | | | | | - Alicia B Byrne
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Steven M Harrison
- Ambry Genetics, Aliso Viejo, CA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Amanda B Spurdle
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia; Faculty of Medicine, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
120
|
Hamza A, El-Sissy C, Yousfi N, Martins PV, Rafat C, Masliah-Planchon J, Frémeaux-Bacchi V, Mesnard L. The absence of CFHR3 and CFHR1 genes from the T2T-CHM13 assembly can limit the molecular diagnosis of complement-related diseases. Eur J Hum Genet 2023; 31:730-732. [PMID: 37032353 PMCID: PMC10325998 DOI: 10.1038/s41431-023-01350-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 03/14/2023] [Accepted: 03/20/2023] [Indexed: 04/11/2023] Open
Affiliation(s)
- Abderaouf Hamza
- Department of Genetics, Institut Curie, PSL Research University, Paris, France
| | - Carine El-Sissy
- Department of Biological Immunology, Hôpital Européen Georges Pompidou, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Nadhir Yousfi
- Unité Mixte de Recherche S1155, Institut National de la Santé et de la Recherche Médicale (INSERM), Paris, France
| | - Paula Vieira Martins
- Department of Biological Immunology, Hôpital Européen Georges Pompidou, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Cédric Rafat
- Service de Soins Intensifs Néphrologiques et Rein Aigu (SINRA), French Intensive Renal Network, Hôpital Tenon, Assistance Publique-Hôpitaux de Paris, Paris, France
- Faculté de Médecine, Sorbonne Université, Paris, France
| | | | - Véronique Frémeaux-Bacchi
- Department of Biological Immunology, Hôpital Européen Georges Pompidou, Assistance Publique-Hôpitaux de Paris, Paris, France
- Unité Mixte de Recherche S1138, Institut National de la Santé et de la Recherche Médicale (INSERM), Centre de Recherche des Cordeliers, Paris, France
| | - Laurent Mesnard
- Unité Mixte de Recherche S1155, Institut National de la Santé et de la Recherche Médicale (INSERM), Paris, France.
- Service de Soins Intensifs Néphrologiques et Rein Aigu (SINRA), French Intensive Renal Network, Hôpital Tenon, Assistance Publique-Hôpitaux de Paris, Paris, France.
- Faculté de Médecine, Sorbonne Université, Paris, France.
- Institut des Sciences du Calcul et des Données, Sorbonne Université, Paris, France.
| |
Collapse
|
121
|
Bucalo A, Conti G, Valentini V, Capalbo C, Bruselles A, Tartaglia M, Bonanni B, Calistri D, Coppa A, Cortesi L, Giannini G, Gismondi V, Manoukian S, Manzella L, Montagna M, Peterlongo P, Radice P, Russo A, Tibiletti MG, Turchetti D, Viel A, Zanna I, Palli D, Silvestri V, Ottini L. Male breast cancer risk associated with pathogenic variants in genes other than BRCA1/2: an Italian case-control study. Eur J Cancer 2023; 188:183-191. [PMID: 37262986 DOI: 10.1016/j.ejca.2023.04.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/24/2023] [Accepted: 04/26/2023] [Indexed: 06/03/2023]
Abstract
BACKGROUND Germline pathogenic variants (PVs) in BRCA1/2 genes are associated with breast cancer (BC) risk in both women and men. Multigene panel testing is being increasingly used for BC risk assessment, allowing the identification of PVs in genes other than BRCA1/2. While data on actionable PVs in other cancer susceptibility genes are now available in female BC, reliable data are still lacking in male BC (MBC). This study aimed to provide the patterns, prevalence and risk estimates associated with PVs in non-BRCA1/2 genes for MBC in order to improve BC prevention for male patients. METHODS We performed a large case-control study in the Italian population, including 767 BRCA1/2-negative MBCs and 1349 male controls, all screened using a custom 50 cancer gene panel. RESULTS PVs in genes other than BRCA1/2 were significantly more frequent in MBCs compared with controls (4.8% vs 1.8%, respectively) and associated with a threefold increased MBC risk (OR: 3.48, 95% CI: 1.88-6.44; p < 0.0001). PV carriers were more likely to have personal (p = 0.03) and family (p = 0.02) history of cancers, not limited to BC. PALB2 PVs were associated with a sevenfold increased MBC risk (OR: 7.28, 95% CI: 1.17-45.52; p = 0.034), and ATM PVs with a fivefold increased MBC risk (OR: 4.79, 95% CI: 1.12-20.56; p = 0.035). CONCLUSIONS This study highlights the role of PALB2 and ATM PVs in MBC susceptibility and provides risk estimates at population level. These data may help in the implementation of multigene panel testing in MBC patients and inform gender-specific BC risk management and decision making for patients and their families.
Collapse
Affiliation(s)
- Agostino Bucalo
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy
| | - Giulia Conti
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy
| | - Virginia Valentini
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy
| | - Carlo Capalbo
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy
| | - Alessandro Bruselles
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome, Italy
| | - Marco Tartaglia
- Molecular Genetics and Functional Genomics Research Unit, Ospedale Pediatrico Bambino Gesù, IRCCS, Rome, Italy
| | - Bernardo Bonanni
- Division of Cancer Prevention and Genetics, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Daniele Calistri
- Istituto Romagnolo per lo Studio dei Tumori "Dino Amadori"-IRST IRCCS, Meldola, Italy
| | - Anna Coppa
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Laura Cortesi
- Department of Oncology and Haematology, University of Modena and Reggio Emilia, Modena, Italy
| | - Giuseppe Giannini
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy; Istituto Pasteur-Fondazione Cenci Bolognetti, Rome, Italy
| | - Viviana Gismondi
- Hereditary Cancer Unit, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Siranoush Manoukian
- Unità di Genetica Medica, Dipartimento di Oncologia Medica ed Ematologia, Fondazione IRCCS Istituto Nazionale dei Tumori (INT), Milan, Italy
| | - Livia Manzella
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Marco Montagna
- Immunology and Molecular Oncology Unit, Veneto Institute of Oncology IOV - IRCCS, Padua, Italy
| | - Paolo Peterlongo
- Genome Diagnostics Program, IFOM ETS - The AIRC Institute of Molecular Oncology, Milan, Italy
| | - Paolo Radice
- Unit of Molecular Bases of Genetic Risk and Genetic Testing, Department of Research, Fondazione IRCCS Istituto Nazionale Dei Tumori (INT), Milan, Italy
| | - Antonio Russo
- Section of Medical Oncology, Department of Surgical and Oncological Sciences, University of Palermo, Palermo, Italy
| | - Maria Grazia Tibiletti
- Dipartimento di Patologia, ASST Settelaghi and Centro di Ricerca per lo studio dei tumori eredo-familiari, Università dell'Insubria, Varese, Italy
| | - Daniela Turchetti
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
| | - Alessandra Viel
- Unità di Oncogenetica e Oncogenomica Funzionale, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, Aviano, Italy
| | - Ines Zanna
- Cancer Risk Factors and Lifestyle Epidemiology Unit, Institute for Cancer Research, Prevention and Clinical Network (ISPRO), Florence, Italy
| | - Domenico Palli
- Cancer Risk Factors and Lifestyle Epidemiology Unit, Institute for Cancer Research, Prevention and Clinical Network (ISPRO), Florence, Italy
| | | | - Laura Ottini
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy.
| |
Collapse
|
122
|
Kovačević M, Milićević O, Branković M, Janković M, Novaković I, Sokić D, Ristić A, Shamsani J, Vojvodić N. Novel variants in established epilepsy genes in focal epilepsy. Seizure 2023; 110:146-152. [PMID: 37390664 DOI: 10.1016/j.seizure.2023.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/30/2023] [Accepted: 06/06/2023] [Indexed: 07/02/2023] Open
Abstract
INTRODUCTION Next generation sequencing (NGS) has greatly expanded our understanding of genetic contributors in multiple epilepsy syndromes, including focal epilepsy. Describing the genetic architecture of common syndromes promises to facilitate the diagnostic process as well as aid in the identification of patients who stand to benefit from genetic testing, but most studies to date have been limited to examining children or adults with intellectual disability. Our aim was to determine the yield of targeted sequencing of 5 established epilepsy genes (DEPDC5, LGI1, SCN1A, GRIN2A, and PCHD19) in an extensively phenotyped cohort of focal epilepsy patients with normal intellectual function or mild intellectual disability, as well as describe novel variants and determine the characteristics of variant carriers. PATIENTS AND METHODS Targeted panel sequencing was performed on 96 patients with a strong clinical suspicion of genetic focal epilepsy. Patients had previously gone through a comprehensive diagnostic epilepsy evaluation in The Neurology Clinic, University Clinical Center of Serbia. Variants of interest (VOI) were classified using the American College of Medical Genetics and the Association for Molecular Pathology criteria. RESULTS Six VOI in eight (8/96, 8.3%) patients were found in our cohort. Four likely pathogenic VOI were determined in six (6/96, 6.2%) patients, two DEPDC5 variants in two patients, one SCN1A variant in two patients and one PCDH19 variant in two patients. One variant of unknown significance (VUS) was found in GRIN2A in one (1/96, 1.0%) patient. Only one VOI in GRIN2A was classified as likely benign. No VOI were detected in LGI1. CONCLUSION Sequencing of only five known epilepsy genes yielded a diagnostic result in 6.2% of our cohort and revealed multiple novel variants. Further research is necessary for a better understanding of the genetic basis in common epilepsy syndromes in patients with normal intellectual function or mild intellectual disability.
Collapse
Affiliation(s)
- Maša Kovačević
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia; Faculty of Medicine, University of Belgrade, Belgrade, Serbia.
| | | | | | - Milena Janković
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ivana Novaković
- Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | - Dragoslav Sokić
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia; Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | - Aleksandar Ristić
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia; Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | | | - Nikola Vojvodić
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia; Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| |
Collapse
|
123
|
Florian K, Benet-Pagès A, Berner D, Teubert A, Eck S, Arnold N, Bauer P, Begemann M, Sturm M, Kleinle S, B. Haack T, Eggermann T. Quality assurance within the context of genome diagnostics (a german perspective). MED GENET-BERLIN 2023; 35:91-104. [PMID: 38840862 PMCID: PMC10842579 DOI: 10.1515/medgen-2023-2028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
The rapid and dynamic implementation of Next-Generation Sequencing (NGS)-based assays has revolutionized genetic testing, and in the near future, nearly all molecular alterations of the human genome will be diagnosable via massive parallel sequencing. While this progress will further corroborate the central role of human genetics in the multidisciplinary management of patients with genetic disorders, it must be accompanied by quality assurance measures in order to allow the safe and optimal use of knowledge ascertained from genome diagnostics. To achieve this, several valuable tools and guidelines have been developed to support the quality of genome diagnostics. In this paper, authors with experience in diverse aspects of genomic analysis summarize the current status of quality assurance in genome diagnostics, with the aim of facilitating further standardization and quality improvement in one of the core competencies of the field.
Collapse
Affiliation(s)
- Kraft Florian
- Medizinische Fakultät der RWTH AachenInstitut für Humangenetik und GenommedizinAachenDeutschland
| | - Anna Benet-Pagès
- Institut für NeurogenomikHelmholtz Zentrum MünchenNeuherbergDeutschland
| | | | | | | | - Norbert Arnold
- Universitätsklinikum Schleswig-HolsteinZentrum für familiären Brust- und Eierstockkrebs; Klinik für Gynäkologie und GeburtshilfeKielDeutschland
| | | | - Matthias Begemann
- Medizinische Fakultät der RWTH AachenInstitut für Humangenetik und GenommedizinAachenDeutschland
| | - Marc Sturm
- Universität TübingenInstitut für Medizinische Genetik und Angewandte GenomikTübingenDeutschland
| | | | - Tobias B. Haack
- Universität TübingenInstitut für Medizinische Genetik und Angewandte GenomikTübingenDeutschland
| | - Thomas Eggermann
- Medizinische Fakultät der RWTH AachenInstitut für Humangenetik und GenommedizinPauwelsstr. 3052074AachenDeutschland
| |
Collapse
|
124
|
Ameratunga R, Edwards ESJ, Lehnert K, Leung E, Woon ST, Lea E, Allan C, Chan L, Steele R, Longhurst H, Bryant VL. The Rapidly Expanding Genetic Spectrum of Common Variable Immunodeficiency-Like Disorders. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2023; 11:1646-1664. [PMID: 36796510 DOI: 10.1016/j.jaip.2023.01.048] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 01/21/2023] [Accepted: 01/27/2023] [Indexed: 02/16/2023]
Abstract
The understanding of common variable immunodeficiency disorders (CVID) is in evolution. CVID was previously a diagnosis of exclusion. New diagnostic criteria have allowed the disorder to be identified with greater precision. With the advent of next-generation sequencing (NGS), it has become apparent that an increasing number of patients with a CVID phenotype have a causative genetic variant. If a pathogenic variant is identified, these patients are removed from the overarching diagnosis of CVID and are deemed to have a CVID-like disorder. In populations where consanguinity is more prevalent, the majority of patients with severe primary hypogammaglobulinemia will have an underlying inborn error of immunity, usually an early-onset autosomal recessive disorder. In nonconsanguineous societies, pathogenic variants are identified in approximately 20% to 30% of patients. These are often autosomal dominant mutations with variable penetrance and expressivity. To add to the complexity of CVID and CVID-like disorders, some genetic variants such as those in TNFSF13B (transmembrane activator calcium modulator cyclophilin ligand interactor) predispose to, or enhance, disease severity. These variants are not causative but can have epistatic (synergistic) interactions with more deleterious mutations to worsen disease severity. This review is a description of the current understanding of genes associated with CVID and CVID-like disorders. This information will assist clinicians in interpreting NGS reports when investigating the genetic basis of disease in patients with a CVID phenotype.
Collapse
Affiliation(s)
- Rohan Ameratunga
- Department of Clinical immunology, Auckland Hospital, Auckland, New Zealand; Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand; Department of Molecular Medicine and Pathology, School of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand.
| | - Emily S J Edwards
- The Jeffrey Modell Diagnostic and Research Centre for Primary Immunodeficiencies, and Allergy and Clinical Immunology Laboratory, Department of Immunology, Monash University, Melbourne, VIC, Australia
| | - Klaus Lehnert
- Applied Translational Genetics Group, School of Biological Sciences, University of Auckland, Auckland, New Zealand; Maurice Wilkins Centre, School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Euphemia Leung
- Auckland Cancer Society Research Centre, School of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - See-Tarn Woon
- Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand
| | - Edward Lea
- Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand
| | - Caroline Allan
- Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand
| | - Lydia Chan
- Department of Clinical immunology, Auckland Hospital, Auckland, New Zealand
| | - Richard Steele
- Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand; Department of Respiratory Medicine, Wellington Hospital, Wellington, New Zealand
| | - Hilary Longhurst
- Department of Medicine, School of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - Vanessa L Bryant
- Department of Immunology, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia; Department of Clinical Immunology and Allergy, Royal Melbourne Hospital, Parkville, VIC, Australia
| |
Collapse
|
125
|
Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, Rezaie N, Trout D, Razavi-Mohseni M, Jiang Y, Borsari B, Morabito S, Liang HY, McGill CJ, Rahmanian S, Sakr J, Jiang S, Zeng W, Carvalho K, Weimer AK, Dionne LA, McShane A, Bedi K, Elhajjajy SI, Upchurch S, Jou J, Youngworth I, Gabdank I, Sud P, Jolanki O, Strattan JS, Kagda MS, Snyder MP, Hitz BC, Moore JE, Weng Z, Bennett D, Reinholdt L, Ljungman M, Beer MA, Gerstein MB, Pachter L, Guigó R, Wold BJ, Mortazavi A. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.15.540865. [PMID: 37292896 PMCID: PMC10245583 DOI: 10.1101/2023.05.15.540865] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.
Collapse
Affiliation(s)
- Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Muhammed Hasan Çelik
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Elisabeth Rebboah
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Narges Rezaie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Milad Razavi-Mohseni
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Samuel Morabito
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Heidi Yahan Liang
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Cassandra J McGill
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Jasmine Sakr
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA
| | - Shan Jiang
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Weihua Zeng
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Klebea Carvalho
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Annika K Weimer
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Louise A Dionne
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Ariel McShane
- Cellular and Molecular Biology Program, University of Michigan, Ann Arbor, USA
- Department of Radiation Oncology, University of Michigan, Ann Arbor, USA
| | - Karan Bedi
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
| | - Shaimae I Elhajjajy
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Sean Upchurch
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ingrid Youngworth
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ben C Hitz
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - David Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, USA
| | - Laura Reinholdt
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Mats Ljungman
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
- Departments of Radiation Oncology and Environmental Health Sciences, University of Michigan, Ann Arbor, USA
| | - Michael A Beer
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, USA
- Department of Statistics and Data Science, Yale University, New Haven, USA
- Department of Computer Science, Yale University, New Haven, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| |
Collapse
|
126
|
Toomata Z, Leask M, Krishnan M, Cadzow M, Dalbeth N, Stamp LK, de Zoysa J, Merriman T, Wilcox P, Dewes O, Murphy R. Genetic testing for misclassified monogenic diabetes in Māori and Pacific peoples in Aōtearoa New Zealand with early-onset type 2 diabetes. Front Endocrinol (Lausanne) 2023; 14:1174699. [PMID: 37234800 PMCID: PMC10206310 DOI: 10.3389/fendo.2023.1174699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 04/20/2023] [Indexed: 05/28/2023] Open
Abstract
Aims Monogenic diabetes accounts for 1-2% of diabetes cases yet is often misdiagnosed as type 2 diabetes. The aim of this study was to examine in Māori and Pacific adults clinically diagnosed with type 2 diabetes within 40 years of age, (a) the prevalence of monogenic diabetes in this population (b) the prevalence of beta-cell autoantibodies and (c) the pre-test probability of monogenic diabetes. Methods Targeted sequencing data of 38 known monogenic diabetes genes was analyzed in 199 Māori and Pacific peoples with BMI of 37.9 ± 8.6 kg/m2 who had been diagnosed with type 2 diabetes between 3 and 40 years of age. A triple-screen combined autoantibody assay was used to test for GAD, IA-2, and ZnT8. MODY probability calculator score was generated in those with sufficient clinical information (55/199). Results No genetic variants curated as likely pathogenic or pathogenic were found. One individual (1/199) tested positive for GAD/IA-2/ZnT8 antibodies. The pre-test probability of monogenic diabetes was calculated in 55 individuals with 17/55 (31%) scoring above the 20% threshold considered for diagnostic testing referral. Discussion Our findings suggest that monogenic diabetes is rare in Māori and Pacific people with clinical age, and the MODY probability calculator likely overestimates the likelihood of a monogenic cause for diabetes in this population.
Collapse
Affiliation(s)
- Zanetta Toomata
- Department of Medicine, Waipapa Taumata Rau, The University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| | - Megan Leask
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Mohanraj Krishnan
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pittsburgh, PA, United States
| | - Murray Cadzow
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Nicola Dalbeth
- Department of Medicine, Waipapa Taumata Rau, The University of Auckland, Auckland, New Zealand
| | - Lisa K. Stamp
- Department of Medicine, University of Otago, Christchurch, Christchurch, New Zealand
| | - Janak de Zoysa
- Department of Medicine, Waipapa Taumata Rau, The University of Auckland, Auckland, New Zealand
| | - Tony Merriman
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Phillip Wilcox
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
| | - Ofa Dewes
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
- Langimalie Research Centre, Auckland, New Zealand
- Centre of Methods and Policy Application in the Social Sciences, The University of Auckland, Auckland, New Zealand
| | - Rinki Murphy
- Department of Medicine, Waipapa Taumata Rau, The University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| |
Collapse
|
127
|
Weisburd B, Tiao G, Rehm HL. Insights from a genome-wide truth set of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539588. [PMID: 37214979 PMCID: PMC10197592 DOI: 10.1101/2023.05.05.539588] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Tools for genotyping tandem repeats (TRs) from short read sequencing data have improved significantly over the past decade. Extensive comparisons of these tools to gold standard diagnostic methods like RP-PCR have confirmed their accuracy for tens to hundreds of well-studied loci. However, a scarcity of high-quality orthogonal truth data limited our ability to measure tool accuracy for the millions of other loci throughout the genome. To address this, we developed a TR truth set based on the Synthetic Diploid Benchmark (SynDip). By identifying the subset of insertions and deletions that represent TR expansions or contractions with motifs between 2 and 50 base pairs, we obtained accurate genotypes for 139,795 pure and 6,845 interrupted repeats in a single diploid sample. Our approach did not require running existing genotyping tools on short read or long read sequencing data and provided an alternative, more accurate view of tandem repeat variation. We applied this truth set to compare the strengths and weaknesses of widely-used tools for genotyping TRs, evaluated the completeness of existing genome-wide TR catalogs, and explored the properties of tandem repeat variation throughout the genome. We found that, without filtering, ExpansionHunter had higher accuracy than GangSTR and HipSTR over a wide range of motifs and allele sizes. Also, when errors in allele size occurred, ExpansionHunter tended to overestimate expansion sizes, while GangSTR tended to underestimate them. Additionally, we saw that widely-used TR catalogs miss between 16% and 41% of variant loci in the truth set. These results suggest that genome-wide analyses would benefit from genotyping a larger set of loci as well as further tool development that builds on the strengths of current algorithms. To that end, we developed a new catalog of 2.8 million loci that captures 95% of variant loci in the truth set, and created a modified version of ExpansionHunter that runs 2 to 3x faster than the original while producing the same output.
Collapse
Affiliation(s)
- Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
128
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539398. [PMID: 37205456 PMCID: PMC10187268 DOI: 10.1101/2023.05.04.539398] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
129
|
Hofman DA, Ruiz-Orera J, Yannuzzi I, Murugesan R, Brown A, Clauser KR, Condurat AL, van Dinter JT, Engels SA, Goodale A, van der Lugt J, Abid T, Wang L, Zhou KN, Vogelzang J, Ligon KL, Phoenix TN, Roth JA, Root DE, Hubner N, Golub TR, Bandopadhayay P, van Heesch S, Prensner JR. Translation of non-canonical open reading frames as a cancer cell survival mechanism in childhood medulloblastoma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539399. [PMID: 37205492 PMCID: PMC10187264 DOI: 10.1101/2023.05.04.539399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
A hallmark of high-risk childhood medulloblastoma is the dysregulation of RNA translation. Currently, it is unknown whether medulloblastoma dysregulates the translation of putatively oncogenic non-canonical open reading frames. To address this question, we performed ribosome profiling of 32 medulloblastoma tissues and cell lines and observed widespread non-canonical ORF translation. We then developed a step-wise approach to employ multiple CRISPR-Cas9 screens to elucidate functional non-canonical ORFs implicated in medulloblastoma cell survival. We determined that multiple lncRNA-ORFs and upstream open reading frames (uORFs) exhibited selective functionality independent of the main coding sequence. One of these, ASNSD1-uORF or ASDURF, was upregulated, associated with the MYC family oncogenes, and was required for medulloblastoma cell survival through engagement with the prefoldin-like chaperone complex. Our findings underscore the fundamental importance of non-canonical ORF translation in medulloblastoma and provide a rationale to include these ORFs in future cancer genomics studies seeking to define new cancer targets.
Collapse
Affiliation(s)
- Damon A. Hofman
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
- These authors contributed equally
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
- These authors contributed equally
| | - Ian Yannuzzi
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | | | - Adam Brown
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Current address: Arbor Biotechnologies, Cambridge, MA, 02140, USA
| | - Karl R. Clauser
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Alexandra L. Condurat
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Jip T. van Dinter
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Sem A.G. Engels
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Amy Goodale
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jasper van der Lugt
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Tanaz Abid
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Li Wang
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Kevin N. Zhou
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Current address: Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA, 91101, USA
| | - Jayne Vogelzang
- Department of Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, 02215, USA
| | - Keith L. Ligon
- Department of Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, 02215, USA
- Department of Pathology, Boston Children’s Hospital, Boston MA 02115
| | - Timothy N. Phoenix
- Division of Pharmaceutical Sciences, James L. Winkle College of Pharmacy, University of Cincinnati, Cincinnati, OH, 45229, USA
| | | | - David E. Root
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Norbert Hubner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
- Charité-Universitätsmedizin, 10117 Berlin, Germany
- German Centre for Cardiovascular Research, Partner Site Berlin, 13347 Berlin, Germany
| | - Todd R. Golub
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115, USA
| | - Pratiti Bandopadhayay
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - John R. Prensner
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115, USA
- Current address: Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
130
|
Pagni S, Custodio HM, Frankish A, Mudge JM, Mills JD, Sisodiya SM. SCN1A: bioinformatically informed revised boundaries for promoter and enhancer regions. Hum Mol Genet 2023; 32:1753-1763. [PMID: 36715146 PMCID: PMC10162429 DOI: 10.1093/hmg/ddad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 01/06/2023] [Accepted: 01/24/2023] [Indexed: 01/31/2023] Open
Abstract
Pathogenic variations in the sodium voltage-gated channel alpha subunit 1 (SCN1A) gene are responsible for multiple epilepsy phenotypes, including Dravet syndrome, febrile seizures (FS) and genetic epilepsy with FS plus. Phenotypic heterogeneity is a hallmark of SCN1A-related epilepsies, the causes of which are yet to be clarified. Genetic variation in the non-coding regulatory regions of SCN1A could be one potential causal factor. However, a comprehensive understanding of the SCN1A regulatory landscape is currently lacking. Here, we summarized the current state of knowledge of SCN1A regulation, providing details on its promoter and enhancer regions. We then integrated currently available data on SCN1A promoters by extracting information related to the SCN1A locus from genome-wide repositories and clearly defined the promoter and enhancer regions of SCN1A. Further, we explored the cellular specificity of differential SCN1A promoter usage. We also reviewed and integrated the available human brain-derived enhancer databases and mouse-derived data to provide a comprehensive computationally developed summary of SCN1A brain-active enhancers. By querying genome-wide data repositories, extracting SCN1A-specific data and integrating the different types of independent evidence, we created a comprehensive catalogue that better defines the regulatory landscape of SCN1A, which could be used to explore the role of SCN1A regulatory regions in disease.
Collapse
Affiliation(s)
- Susanna Pagni
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Bucks SL9 0RJ, UK
| | - Helena Martins Custodio
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Bucks SL9 0RJ, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - James D Mills
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Bucks SL9 0RJ, UK
- Amsterdam UMC, Department of (Neuro) Pathology, Amsterdam Neuroscience, University of Amsterdam, Amsterdam, 1105 AZ The Netherlands
| | - Sanjay M Sisodiya
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Bucks SL9 0RJ, UK
| |
Collapse
|
131
|
Zhang Q, Shao M. Transcript Assembly and Annotations: Bias and Adjustment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.20.537700. [PMID: 37131680 PMCID: PMC10153229 DOI: 10.1101/2023.04.20.537700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Motivation Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. Results We investigate the impact of annotations on transcript assembly. We observe that conflicting conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
Collapse
Affiliation(s)
- Qimin Zhang
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
| | - Mingfu Shao
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
- Huck Institutes of the Life Sciences, The Pennsylvania State University
| |
Collapse
|
132
|
Kerimov N, Tambets R, Hayhurst JD, Rahu I, Kolberg P, Raudvere U, Kuzmin I, Chowdhary A, Vija A, Teras HJ, Kanai M, Ulirsch J, Ryten M, Hardy J, Guelfi S, Trabzuni D, Kim-Hellmuth S, Rayner W, Finucane H, Peterson H, Mosaku A, Parkinson H, Alasoo K. Systematic visualisation of molecular QTLs reveals variant mechanisms at GWAS loci. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535816. [PMID: 37066341 PMCID: PMC10104061 DOI: 10.1101/2023.04.06.535816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Splicing quantitative trait loci (QTLs) have been implicated as a common mechanism underlying complex trait associations. However, utilising splicing QTLs in target discovery and prioritisation has been challenging due to extensive data normalisation which often renders the direction of the genetic effect as well as its magnitude difficult to interpret. This is further complicated by the fact that strong expression QTLs often manifest as weak splicing QTLs and vice versa, making it difficult to uniquely identify the underlying molecular mechanism at each locus. We find that these ambiguities can be mitigated by visualising the association between the genotype and average RNA sequencing read coverage in the region. Here, we generate these QTL coverage plots for 1.7 million molecular QTL associations in the eQTL Catalogue identified with five quantification methods. We illustrate the utility of these QTL coverage plots by performing colocalisation between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. We find that while visually confirmed splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases. All our association summary statistics and QTL coverage plots are freely available at https://www.ebi.ac.uk/eqtl/.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ralf Tambets
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - James D Hayhurst
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ida Rahu
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Peep Kolberg
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Uku Raudvere
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Anshika Chowdhary
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Andreas Vija
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Hans J Teras
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jacob Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - John Hardy
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Sebastian Guelfi
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Daniah Trabzuni
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Sarah Kim-Hellmuth
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital LMU Munich, Munich, Germany
| | - Will Rayner
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Hilary Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Abayomi Mosaku
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Parkinson
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
133
|
Omenn GS, Lane L, Overall CM, Pineau C, Packer NH, Cristea IM, Lindskog C, Weintraub ST, Orchard S, Roehrl MH, Nice E, Liu S, Bandeira N, Chen YJ, Guo T, Aebersold R, Moritz RL, Deutsch EW. The 2022 Report on the Human Proteome from the HUPO Human Proteome Project. J Proteome Res 2023; 22:1024-1042. [PMID: 36318223 PMCID: PMC10081950 DOI: 10.1021/acs.jproteome.2c00498] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 407 (93.2%) of the 19 750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".
Collapse
Affiliation(s)
- Gilbert S. Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and University of Geneva, 1015 Lausanne, Switzerland
| | | | - Charles Pineau
- French Institute of Health and Medical Research, 35042 RENNES Cedex, France
| | - Nicolle H. Packer
- Macquarie University, Sydney, NSW 2109, Australia
- Griffith University’s Institute for Glycomics, Sydney, NSW 2109, Australia
| | | | | | - Susan T. Weintraub
- University of Texas Health Science Center-San Antonio, San Antonio, Texas 78229-3900, United States
| | - Sandra Orchard
- EMBL-EBI, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | - Michael H.A. Roehrl
- Memorial Sloan Kettering Cancer Center, New York, New York, 10065, United States
| | | | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, California 92093, United States
| | - Yu-Ju Chen
- National Taiwan University, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Tiannan Guo
- Westlake University Guomics Laboratory of Big Proteomic Data, Hangzhou 310024, Zhejiang Province, China
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology in ETH Zurich, 8092 Zurich, Switzerland
| | - Robert L. Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
134
|
Seaby EG, Leggatt G, Cheng G, Thomas NS, Ashton JJ, Stafford I, Baralle D, Rehm HL, O'Donnell-Luria A, Ennis S. A gene pathogenicity tool 'GenePy' identifies missed biallelic diagnoses in the 100,000 Genomes Project. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.21.23287545. [PMID: 37034701 PMCID: PMC10081430 DOI: 10.1101/2023.03.21.23287545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
The 100,000 Genomes Project (100KGP) diagnosed a quarter of recruited affected participants, but 26% of diagnoses were in genes not on the chosen gene panel(s); with many being de novo variants of high impact. However, assessing biallelic variants without a gene panel is challenging, due to the number of variants requiring scrutiny. We sought to identify potential missed biallelic diagnoses independent of the gene panel applied using GenePy - a whole gene pathogenicity metric. GenePy scores all variants called in a given individual, incorporating allele frequency, zygosity, and a user-defined deleterious metric (CADD v1.6 applied herein). GenePy then combines all variant scores for individual genes, generating an aggregate score per gene, per participant. We calculated GenePy scores for 2862 recessive disease genes in 78,216 individuals in 100KGP. For each gene, we ranked participant GenePy scores for that gene, and scrutinised affected individuals without a diagnosis whose scores ranked amongst the top-5 for each gene. We assessed these participants' phenotypes for overlap with the disease gene associated phenotype for which they were highly ranked. Where phenotypes overlapped, we extracted rare variants in the gene of interest and applied phase, ClinVar and ACMG classification looking for putative causal biallelic variants. 3184 affected individuals without a molecular diagnosis had a top-5 ranked GenePy gene score and 682/3184 (21%) had phenotypes overlapping with one of the top-ranking genes. After removing 13 withdrawn participants, in 122/669 (18%) of the phenotype-matched cases, we identified a putative missed diagnosis in a top-ranked gene supported by phasing, ClinVar and ACMG classification. A further 334/669 (50%) of cases have a possible missed diagnosis but require functional validation. Applying GenePy at scale has identified potential diagnoses for 456/3183 (14%) of undiagnosed participants who had a top-5 ranked GenePy score in a recessive disease gene, whilst adding only 1.2 additional variants (per individual) for assessment.
Collapse
Affiliation(s)
- Eleanor G Seaby
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
- Paediatric Infectious Diseases, Imperial College London, London, W2 1NY, UK
| | - Gary Leggatt
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | - Guo Cheng
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | - N Simon Thomas
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
- Wessex Regional Genomics Laboratory, Salisbury NHS Foundation Trust, Salisbury, SP2 8BJ, UK
| | - James J Ashton
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | | | - Diana Baralle
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
| | - Sarah Ennis
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| |
Collapse
|
135
|
García-Ruiz S, Zhang D, Gustavsson EK, Rocamora-Perez G, Grant-Peters M, Fairbrother-Browne A, Reynolds RH, Brenton JW, Gil-Martínez AL, Chen Z, Rio DC, Botia JA, Guelfi S, Collado-Torres L, Ryten M. Splicing accuracy varies across human introns, tissues and age. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.29.534370. [PMID: 37034741 PMCID: PMC10081249 DOI: 10.1101/2023.03.29.534370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples and 42 human body sites, focusing on split reads partially mapping to known transcripts in annotation. We show that mis-splicing occurs at different rates across introns and tissues and that these splicing inaccuracies are primarily affected by the abundance of core components of the spliceosome assembly and its regulators. Using publicly available data on short-hairpin RNA-knockdowns of numerous spliceosomal components and related regulators, we found support for the importance of RNA-binding proteins in mis-splicing. We also demonstrated that age is positively correlated with mis-splicing, and it affects genes implicated in neurodegenerative diseases. This in-depth characterisation of mis-splicing can have important implications for our understanding of the role of splicing inaccuracies in human disease and the interpretation of long-read RNA-sequencing data.
Collapse
Affiliation(s)
- S García-Ruiz
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - D Zhang
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
| | - E K Gustavsson
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - G Rocamora-Perez
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
| | - M Grant-Peters
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - A Fairbrother-Browne
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King's College London, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - R H Reynolds
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - J W Brenton
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - A L Gil-Martínez
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - Z Chen
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - D C Rio
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA 94720, USA
| | - J A Botia
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - S Guelfi
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Verge Genomics, South San Francisco, CA, 94080, USA
| | - L Collado-Torres
- Lieber Institute for Brain Development, Baltimore, MD, USA , 21205
| | - M Ryten
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| |
Collapse
|
136
|
Lyulcheva-Bennett E, Genomics England Research Consortium, Bennett D. A retrospective analysis of phosphatase catalytic subunit gene variants in patients with rare disorders identifies novel candidate neurodevelopmental disease genes. Front Cell Dev Biol 2023; 11:1107930. [PMID: 37056996 PMCID: PMC10086149 DOI: 10.3389/fcell.2023.1107930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/03/2023] [Indexed: 03/30/2023] Open
Abstract
Rare genetic disorders represent some of the most severe and life-limiting conditions that constitute a considerable burden on global healthcare systems and societies. Most individuals affected by rare disorders remain undiagnosed, highlighting the unmet need for improved disease gene discovery and novel variant interpretation. Aberrant (de) phosphorylation can have profound pathological consequences underpinning many disease processes. Numerous phosphatases and associated proteins have been identified as disease genes, with many more likely to have gone undiscovered thus far. To begin to address these issues, we have performed a systematic survey of de novo variants amongst 189 genes encoding phosphatase catalytic subunits found in rare disease patients recruited to the 100,000 Genomes Project (100 kGP), the largest national sequencing project of its kind in the United Kingdom. We found that 49% of phosphatases were found to carry de novo mutation(s) in this cohort. Only 25% of these phosphatases have been previously linked to genetic disorders. A gene-to-patient approach matching variants to phenotypic data identified 9 novel candidate rare-disease genes: PTPRD, PTPRG, PTPRT, PTPRU, PTPRZ1, MTMR3, GAK, TPTE2, PTPN18. As the number of patients undergoing whole genome sequencing increases and information sharing improves, we anticipate that reiterative analysis of genomic and phenotypic data will continue to identify candidate phosphatase disease genes for functional validation. This is the first step towards delineating the aetiology of rare genetic disorders associated with altered phosphatase function, leading to new biological insights and improved clinical outcomes for the affected individuals and their families.
Collapse
Affiliation(s)
| | | | - Daimark Bennett
- Division of Developmental Biology and Medicine, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
137
|
Varabyou A, Erdogdu B, Salzberg SL, Pertea M. Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.23.533704. [PMID: 36993373 PMCID: PMC10055401 DOI: 10.1101/2023.03.23.533704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
ORFanage is a system designed to assign open reading frames (ORFs) to both known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing (RNA-seq) experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the RefSeq and GENCODE human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
138
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. ARXIV 2023:arXiv:2303.13996v1. [PMID: 36994150 PMCID: PMC10055485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, São Paulo, SP, Brasil
| | - Silvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
| | - Francisco M. De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA; Tempus Labs, Inc., Chicago, IL
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Da Vinci Building. Melbourn Science Park, Royston UK SG8 6HB
| | - Artemis G. Hatzigeorgiou
- Universithy of Thessaly, Department of Computer Science and Biomedical Informatics, Lamia, Greece; Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, D04 V1W8 Dublin, Ireland; Conway Institute of Biomedical and Biomolecular Research, University College Dublin, D04 V1W8 Dublin, Ireland; Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland; Department for BioMedical Research, University of Bern, 3008 Bern, Switzerland
| | - Terence D. Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama Kanagawa 230-0045 Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology; Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ales Varabyou
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A. Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville 3010 Vic Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Human Technopole, via Rita Levi Montalcini 1, Milan 20157 Italy
| | - Steven L. Salzberg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Immunology and Regenerative Biology; Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
139
|
Greene D, Pirri D, Frudd K, Sackey E, Al-Owain M, Giese APJ, Ramzan K, Riaz S, Yamanaka I, Boeckx N, Thys C, Gelb BD, Brennan P, Hartill V, Harvengt J, Kosho T, Mansour S, Masuno M, Ohata T, Stewart H, Taibah K, Turner CLS, Imtiaz F, Riazuddin S, Morisaki T, Ostergaard P, Loeys BL, Morisaki H, Ahmed ZM, Birdsey GM, Freson K, Mumford A, Turro E. Genetic association analysis of 77,539 genomes reveals rare disease etiologies. Nat Med 2023; 29:679-688. [PMID: 36928819 PMCID: PMC10033407 DOI: 10.1038/s41591-023-02211-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 01/06/2023] [Indexed: 03/18/2023]
Abstract
The genetic etiologies of more than half of rare diseases remain unknown. Standardized genome sequencing and phenotyping of large patient cohorts provide an opportunity for discovering the unknown etiologies, but this depends on efficient and powerful analytical methods. We built a compact database, the 'Rareservoir', containing the rare variant genotypes and phenotypes of 77,539 participants sequenced by the 100,000 Genomes Project. We then used the Bayesian genetic association method BeviMed to infer associations between genes and each of 269 rare disease classes assigned by clinicians to the participants. We identified 241 known and 19 previously unidentified associations. We validated associations with ERG, PMEPA1 and GPR156 by searching for pedigrees in other cohorts and using bioinformatic and experimental approaches. We provide evidence that (1) loss-of-function variants in the Erythroblast Transformation Specific (ETS)-family transcription factor encoding gene ERG lead to primary lymphoedema, (2) truncating variants in the last exon of transforming growth factor-β regulator PMEPA1 result in Loeys-Dietz syndrome and (3) loss-of-function variants in GPR156 give rise to recessive congenital hearing impairment. The Rareservoir provides a lightweight, flexible and portable system for synthesizing the genetic and phenotypic data required to study rare disease cohorts with tens of thousands of participants.
Collapse
Affiliation(s)
- Daniel Greene
- Department of Medicine, University of Cambridge, Cambridge, UK
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniela Pirri
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Karen Frudd
- National Heart and Lung Institute, Imperial College London, London, UK
- University College London Institute of Ophthalmology, University College London, London, UK
| | - Ege Sackey
- Molecular and Clinical Sciences Institute, St. George's University of London, London, UK
| | - Mohammed Al-Owain
- Department of Medical Genomics, Centre for Genomic Medicine, King Faisal Specialist Hospital & Research Centre, Riyadh, Saudi Arabia
| | - Arnaud P J Giese
- Department of Otorhinolaryngology Head and Neck Surgery, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Khushnooda Ramzan
- Department of Clinical Genomics, Centre for Genomic Medicine, King Faisal Specialist Hospital & Research Centre, Riyadh, Saudi Arabia
| | - Sehar Riaz
- Department of Otorhinolaryngology Head and Neck Surgery, School of Medicine, University of Maryland, Baltimore, MD, USA
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Itaru Yamanaka
- Department of Bioscience and Genetics, National Cerebral and Cardiovascular Center, Osaka, Japan
| | - Nele Boeckx
- Center for Medical Genetics, Antwerp University Hospital/University of Antwerp, Antwerp, Belgium
| | - Chantal Thys
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KU Leuven, Leuven, Belgium
| | - Bruce D Gelb
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Paul Brennan
- Northern Genetics Service, Newcastle upon Tyne Hospitals National Health Service Trust International Centre for Life, Newcastle upon Tyne, UK
| | - Verity Hartill
- Department of Clinical Genetics, Chapel Allerton Hospital, Leeds Teaching Hospitals National Health Service Trust, Leeds, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Julie Harvengt
- Centre for Medical Genetics, Centre Hospitalier Universitaire de Liège, Liège, Belgium
| | - Tomoki Kosho
- Department of Medical Genetics, Shinshu University School of Medicine, Nagano, Japan
- Center for Medical Genetics, Shinshu University Hospital, Nagano, Japan
| | - Sahar Mansour
- Molecular and Clinical Sciences Institute, St. George's University of London, London, UK
- South West Thames Regional Genetics Service, St. George's University Hospitals National Health Service Foundation Trust, London, UK
| | - Mitsuo Masuno
- Department of Medical Genetics, Kawasaki Medical School Hospital, Okayama, Japan
| | | | - Helen Stewart
- Oxford University Hospitals National Health Service Foundation Trust, Oxford, UK
| | - Khalid Taibah
- Ear Nose and Throat Medical Centre, Riyadh, Saudi Arabia
| | - Claire L S Turner
- Peninsula Clinical Genetics Service, Royal Devon & Exeter Hospital, Exeter, UK
| | - Faiqa Imtiaz
- Department of Clinical Genomics, Centre for Genomic Medicine, King Faisal Specialist Hospital & Research Centre, Riyadh, Saudi Arabia
| | - Saima Riazuddin
- Department of Otorhinolaryngology Head and Neck Surgery, School of Medicine, University of Maryland, Baltimore, MD, USA
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Takayuki Morisaki
- Department of Bioscience and Genetics, National Cerebral and Cardiovascular Center, Osaka, Japan
- Division of Molecular Pathology and Department of Internal Medicine, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Pia Ostergaard
- Molecular and Clinical Sciences Institute, St. George's University of London, London, UK
| | - Bart L Loeys
- Center for Medical Genetics, Antwerp University Hospital/University of Antwerp, Antwerp, Belgium
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Hiroko Morisaki
- Department of Bioscience and Genetics, National Cerebral and Cardiovascular Center, Osaka, Japan
- Department of Medical Genetics, Sakakibara Heart Institute, Tokyo, Japan
| | - Zubair M Ahmed
- Department of Otorhinolaryngology Head and Neck Surgery, School of Medicine, University of Maryland, Baltimore, MD, USA
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Graeme M Birdsey
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KU Leuven, Leuven, Belgium
| | - Andrew Mumford
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, UK
- South West National Health Service Genomic Medicine Service Alliance, Bristol, UK
| | - Ernest Turro
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
140
|
Seaby EG, Thomas NS, Webb A, Brittain H, Taylor Tavares AL, Baralle D, Rehm HL, O'Donnell-Luria A, Ennis S. Targeting de novo loss-of-function variants in constrained disease genes improves diagnostic rates in the 100,000 Genomes Project. Hum Genet 2023; 142:351-362. [PMID: 36477409 PMCID: PMC9950176 DOI: 10.1007/s00439-022-02509-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022]
Abstract
BACKGROUND Genome sequencing was first offered clinically in the UK through the 100,000 Genomes Project (100KGP). Analysis was restricted to predefined gene panels associated with the patient's phenotype. However, panels rely on clearly characterised phenotypes and risk missing diagnoses outside of the panel(s) applied. We propose a complementary method to rapidly identify pathogenic variants, including those missed by 100KGP methods. METHODS The Loss-of-function Observed/Expected Upper-bound Fraction (LOEUF) score quantifies gene constraint, with low scores correlated with haploinsufficiency. We applied DeNovoLOEUF, a filtering strategy to sequencing data from 13,949 rare disease trios in the 100KGP, by filtering for rare, de novo, loss-of-function variants in disease genes with a LOEUF score < 0.2. We compared our findings with the corresponding patient's diagnostic reports. RESULTS 324/332 (98%) of the variants identified using DeNovoLOEUF were diagnostic or partially diagnostic (whereby the variant was responsible for some of the phenotype). We identified 39 diagnoses that were "missed" by 100KGP standard analyses, which are now being returned to patients. CONCLUSION We have demonstrated a highly specific and rapid method with a 98% positive predictive value that has good concordance with standard analysis, low false-positive rate, and can identify additional diagnoses. Globally, as more patients are being offered genome sequencing, we anticipate that DeNovoLOEUF will rapidly identify new diagnoses and facilitate iterative analyses when new disease genes are discovered.
Collapse
Affiliation(s)
- Eleanor G Seaby
- Genomic Informatics Group, Human Development and Health, Faculty of Medicine, University Hospital Southampton, MP 808, Duthie Building, Southampton, SO16 6YD, Hampshire, UK.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, 02115, USA.
- Paediatric Infectious Diseases, Imperial College London, London, W2 1NY, UK.
| | - N Simon Thomas
- Genomic Informatics Group, Human Development and Health, Faculty of Medicine, University Hospital Southampton, MP 808, Duthie Building, Southampton, SO16 6YD, Hampshire, UK
- Wessex Regional Genomics Laboratory, Salisbury NHS Foundation Trust, Salisbury, SP2 8BJ, UK
| | - Amy Webb
- Wessex Regional Genomics Laboratory, Salisbury NHS Foundation Trust, Salisbury, SP2 8BJ, UK
| | - Helen Brittain
- Genomics England, Charterhouse Square, London, EC1M 6BQ, UK
| | - Ana Lisa Taylor Tavares
- Genomics England, Charterhouse Square, London, EC1M 6BQ, UK
- East Anglian Medical Genetics Service, Cambridge University Hospital, Hills Road, Cambridge, CB2 0QQ, UK
| | - Diana Baralle
- Genomic Informatics Group, Human Development and Health, Faculty of Medicine, University Hospital Southampton, MP 808, Duthie Building, Southampton, SO16 6YD, Hampshire, UK
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Sarah Ennis
- Genomic Informatics Group, Human Development and Health, Faculty of Medicine, University Hospital Southampton, MP 808, Duthie Building, Southampton, SO16 6YD, Hampshire, UK
| |
Collapse
|
141
|
Walker LC, de la Hoya M, Wiggins GA, Lindy A, Vincent LM, Parsons M, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A, Zimmermann H, Byrne AB, Pesaran T, Karam R, Harrison SM, Spurdle AB. APPLICATION OF THE ACMG/AMP FRAMEWORK TO CAPTURE EVIDENCE RELEVANT TO PREDICTED AND OBSERVED IMPACT ON SPLICING: RECOMMENDATIONS FROM THE CLINGEN SVI SPLICING SUBGROUP. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.24.23286431. [PMID: 36865205 PMCID: PMC9980257 DOI: 10.1101/2023.02.24.23286431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) framework for classifying variants uses six evidence categories related to the splicing potential of variants: PVS1 (null variant in a gene where loss-of-function is the mechanism of disease), PS3 (functional assays show damaging effect on splicing), PP3 (computational evidence supports a splicing effect), BS3 (functional assays show no damaging effect on splicing), BP4 (computational evidence suggests no splicing impact), and BP7 (silent change with no predicted impact on splicing). However, the lack of guidance on how to apply such codes has contributed to variation in the specifications developed by different Clinical Genome Resource (ClinGen) Variant Curation Expert Panels. The ClinGen Sequence Variant Interpretation (SVI) Splicing Subgroup was established to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. Our study utilised empirically derived splicing evidence to: 1) determine the evidence weighting of splicing-related data and appropriate criteria code selection for general use, 2) outline a process for integrating splicing-related considerations when developing a gene-specific PVS1 decision tree, and 3) exemplify methodology to calibrate bioinformatic splice prediction tools. We propose repurposing of the PVS1_Strength code to capture splicing assay data that provide experimental evidence for variants resulting in RNA transcript(s) with loss of function. Conversely BP7 may be used to capture RNA results demonstrating no impact on splicing for both intronic and synonymous variants, and for missense variants if protein functional impact has been excluded. Furthermore, we propose that the PS3 and BS3 codes are applied only for well-established assays that measure functional impact that is not directly captured by RNA splicing assays. We recommend the application of PS1 based on similarity of predicted RNA splicing effects for a variant under assessment in comparison to a known Pathogenic variant. The recommendations and approaches for consideration and evaluation of RNA assay evidence described aim to help standardise variant pathogenicity classification processes and result in greater consistency when interpreting splicing-based evidence.
Collapse
|
142
|
Kovačević M, Janković M, Branković M, Milićević O, Novaković I, Sokić D, Ristić A, Shamsani J, Vojvodić N. Novel GATOR1 variants in focal epilepsy. Epilepsy Behav 2023; 141:109139. [PMID: 36848747 DOI: 10.1016/j.yebeh.2023.109139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/03/2023] [Accepted: 02/05/2023] [Indexed: 02/27/2023]
Abstract
INTRODUCTION Variants in GATOR1 genes are well established in focal epilepsy syndromes. A strong association of GATOR1 variants with drug-resistant epilepsy as well as an increased risk of sudden unexplained death in epilepsy warrants developing strategies to facilitate the identification of patients who could potentially benefit from genetic testing and precision medicine. We aimed to determine the yield of GATOR1 gene sequencing in patients with focal epilepsy typically referred for genetic testing, establish novel GATOR1 variants and determine clinical, electroencephalographic, and radiological characteristics of variant carriers. PATIENTS AND METHODS Ninety-six patients with clinical suspicion of genetic focal epilepsy with previous comprehensive diagnostic epilepsy evaluation in The Neurology Clinic, University Clinical Center of Serbia, were included in the study. Sequencing was performed using a custom gene panel encompassing DEPDC5, NPRL2, and NPRL3. Variants of interest (VOI) were classified according to criteria proposed by the American College of Medical Genetics and the Association for Molecular Pathology. RESULTS Four previously unreported VOI in 4/96 (4.2%) patients were found in our cohort. Three likely pathogenic variants were determined in 3/96 (3.1%) patients, one frameshift variant in DEPDC5 in a patient with nonlesional frontal lobe epilepsy, one splicogenic DEPDC5 variant in a patient with nonlesional posterior quadrant epilepsy, and one frameshift variant in NPRL2 in a patient with temporal lobe epilepsy associated with hippocampal sclerosis. Only one VOI, a missense variant in NPRL3, found in 1/96 (1.1%) patients, was classified as a variant of unknown significance. CONCLUSION GATOR1 gene sequencing was diagnostic in 3.1% of our cohort and revealed three novel likely pathogenic variants, including a previously unreported association of temporal lobe epilepsy with hippocampal sclerosis with an NPRL2 variant. Further research is essential for a better understanding of the clinical scope of GATOR1 gene-associated epilepsy.
Collapse
Affiliation(s)
- Maša Kovačević
- Neurology Clinic, University Clinical Center of Serbia, Serbia; Faculty of Medicine, University of Belgrade, Serbia.
| | - Milena Janković
- Neurology Clinic, University Clinical Center of Serbia, Serbia
| | | | | | | | - Dragoslav Sokić
- Neurology Clinic, University Clinical Center of Serbia, Serbia; Faculty of Medicine, University of Belgrade, Serbia
| | - Aleksandar Ristić
- Neurology Clinic, University Clinical Center of Serbia, Serbia; Faculty of Medicine, University of Belgrade, Serbia
| | | | - Nikola Vojvodić
- Neurology Clinic, University Clinical Center of Serbia, Serbia; Faculty of Medicine, University of Belgrade, Serbia
| |
Collapse
|
143
|
de Sainte Agathe JM, Filser M, Isidor B, Besnard T, Gueguen P, Perrin A, Van Goethem C, Verebi C, Masingue M, Rendu J, Cossée M, Bergougnoux A, Frobert L, Buratti J, Lejeune É, Le Guern É, Pasquier F, Clot F, Kalatzis V, Roux AF, Cogné B, Baux D. SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation. Hum Genomics 2023; 17:7. [PMID: 36765386 PMCID: PMC9912651 DOI: 10.1186/s40246-023-00451-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 01/18/2023] [Indexed: 02/12/2023] Open
Abstract
SpliceAI is an open-source deep learning splicing prediction algorithm that has demonstrated in the past few years its high ability to predict splicing defects caused by DNA variations. However, its outputs present several drawbacks: (1) although the numerical values are very convenient for batch filtering, their precise interpretation can be difficult, (2) the outputs are delta scores which can sometimes mask a severe consequence, and (3) complex delins are most often not handled. We present here SpliceAI-visual, a free online tool based on the SpliceAI algorithm, and show how it complements the traditional SpliceAI analysis. First, SpliceAI-visual manipulates raw scores and not delta scores, as the latter can be misleading in certain circumstances. Second, the outcome of SpliceAI-visual is user-friendly thanks to the graphical presentation. Third, SpliceAI-visual is currently one of the only SpliceAI-derived implementations able to annotate complex variants (e.g., complex delins). We report here the benefits of using SpliceAI-visual and demonstrate its relevance in the assessment/modulation of the PVS1 classification criteria. We also show how SpliceAI-visual can elucidate several complex splicing defects taken from the literature but also from unpublished cases. SpliceAI-visual is available as a Google Colab notebook and has also been fully integrated in a free online variant interpretation tool, MobiDetails ( https://mobidetails.iurc.montp.inserm.fr/MD ).
Collapse
Affiliation(s)
- Jean-Madeleine de Sainte Agathe
- Département de Génétique Médicale, Groupe Hospitalier Universitaire de la Pitié Salpêtrière, AP-HP.Sorbonne Université, Laboratoire de Médecine Génomique Sorbonne Université, Paris, France.
- Laboratoire de Biologie Médicale Multi-Sites SeqOIA (laboratoire-seqoia.fr/), Paris, France.
| | - Mathilde Filser
- Département de Génétique Médicale, Groupe Hospitalier Universitaire de la Pitié Salpêtrière, AP-HP.Sorbonne Université, Laboratoire de Médecine Génomique Sorbonne Université, Paris, France
| | - Bertrand Isidor
- Nantes Université, CHU Nantes, Service de Génétique Médicale, 44000, Nantes, France
| | - Thomas Besnard
- Nantes Université, CHU Nantes, Service de Génétique Médicale, 44000, Nantes, France
| | - Paul Gueguen
- Laboratoire de Biologie Médicale Multi-Sites SeqOIA (laboratoire-seqoia.fr/), Paris, France
- Service de Génétique, Inserm U1253, CHRU de Tours, Tours, France
| | - Aurélien Perrin
- Laboratoire de Génétique Moléculaire, CHU de Montpellier, Université de Montpellier, Montpellier, France
| | - Charles Van Goethem
- Laboratoire de Génétique Moléculaire, CHU de Montpellier, Université de Montpellier, Montpellier, France
| | - Camille Verebi
- Service de Médecine Génomique, Maladies de Système et d'Organe, Fédération de Génétique et de Médecine Génomique, DMU BioPhyGen, APHP Centre-Université Paris Cité, Hôpital Cochin, Paris, France
| | - Marion Masingue
- Centre de référence des maladies neuromusculaires Nord/Est/Ile de France, Hôpital Pitié-Salpêtrière, APHP, Paris, France
| | - John Rendu
- Inserm, U1216, CHU Grenoble Alpes, Grenoble Institut Neurosciences, Université Grenoble Alpes, Grenoble, France
| | - Mireille Cossée
- Laboratoire de Génétique Moléculaire, CHU de Montpellier, Université de Montpellier, Montpellier, France
- PhyMedExp, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Anne Bergougnoux
- Laboratoire de Génétique Moléculaire, CHU de Montpellier, Université de Montpellier, Montpellier, France
- PhyMedExp, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Laurent Frobert
- Laboratoire de Biologie Médicale Multi-Sites SeqOIA (laboratoire-seqoia.fr/), Paris, France
| | - Julien Buratti
- Département de Génétique Médicale, Groupe Hospitalier Universitaire de la Pitié Salpêtrière, AP-HP.Sorbonne Université, Laboratoire de Médecine Génomique Sorbonne Université, Paris, France
| | - Élodie Lejeune
- Département de Génétique Médicale, Groupe Hospitalier Universitaire de la Pitié Salpêtrière, AP-HP.Sorbonne Université, Laboratoire de Médecine Génomique Sorbonne Université, Paris, France
| | - Éric Le Guern
- Département de Génétique Médicale, Groupe Hospitalier Universitaire de la Pitié Salpêtrière, AP-HP.Sorbonne Université, Laboratoire de Médecine Génomique Sorbonne Université, Paris, France
- Laboratoire de Biologie Médicale Multi-Sites SeqOIA (laboratoire-seqoia.fr/), Paris, France
| | - Florence Pasquier
- Centre mémoire, Inserm U1172 DistALZ, Licend, Univ Lille, CHU Lille, 59000, Lille, France
| | - Fabienne Clot
- Département de Génétique Médicale, Groupe Hospitalier Universitaire de la Pitié Salpêtrière, AP-HP.Sorbonne Université, Laboratoire de Médecine Génomique Sorbonne Université, Paris, France
| | | | - Anne-Françoise Roux
- Laboratoire de Génétique Moléculaire, CHU de Montpellier, Université de Montpellier, Montpellier, France
- INM, Univ Montpellier, INSERM, CHU Montpellier, Montpellier, France
| | - Benjamin Cogné
- Laboratoire de Biologie Médicale Multi-Sites SeqOIA (laboratoire-seqoia.fr/), Paris, France
- Nantes Université, CHU Nantes, Service de Génétique Médicale, 44000, Nantes, France
| | - David Baux
- Laboratoire de Génétique Moléculaire, CHU de Montpellier, Université de Montpellier, Montpellier, France
- INM, Univ Montpellier, INSERM, CHU Montpellier, Montpellier, France
| |
Collapse
|
144
|
Wright CF, FitzPatrick DR, Ware JS, Rehm HL, Firth HV. Importance of adopting standardized MANE transcripts in clinical reporting. Genet Med 2023; 25:100331. [PMID: 36441169 DOI: 10.1016/j.gim.2022.10.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 10/24/2022] [Accepted: 10/25/2022] [Indexed: 11/29/2022] Open
Affiliation(s)
- Caroline F Wright
- Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter, United Kingdom
| | - David R FitzPatrick
- MRC Human Genetics Unit, Institute of Genetic and Cancer, The University of Edinburgh, Edinburgh, United Kingdom
| | - James S Ware
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital and Broad Institute of MIT and Harvard, Boston, MA.
| | - Helen V Firth
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom; Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
145
|
aRgus: Multilevel visualization of non-synonymous single nucleotide variants & advanced pathogenicity score modeling for genetic vulnerability assessment. Comput Struct Biotechnol J 2023; 21:1077-1083. [PMID: 36789265 PMCID: PMC9900257 DOI: 10.1016/j.csbj.2023.01.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/18/2023] [Accepted: 01/18/2023] [Indexed: 01/26/2023] Open
Abstract
The widespread use of high-throughput sequencing techniques is leading to a rapidly increasing number of disease-associated variants of unknown significance and candidate genes. Integration of knowledge concerning their genetic, protein as well as functional and conservational aspects is necessary for an exhaustive assessment of their relevance and for prioritization of further clinical and functional studies investigating their role in human disease. To collect the necessary information, a multitude of different databases has to be accessed and data extraction from the original sources commonly is not user-friendly and requires advanced bioinformatics skills. This leads to a decreased data accessibility for a relevant number of potential users such as clinicians, geneticist, and clinical researchers. Here, we present aRgus (https://argus.urz.uni-heidelberg.de/), a standalone webtool for simple extraction and intuitive visualization of multi-layered gene, protein, variant, and variant effect prediction data. aRgus provides interactive exploitation of these data within seconds for any known gene of the human genome. In contrast to existing online platforms for compilation of variant data, aRgus complements visualization of chromosomal exon-intron structure and protein domain annotation with ClinVar and gnomAD variant distributions as well as position-specific variant effect prediction score modeling. aRgus thereby enables timely assessment of protein regions vulnerable to variation with single amino acid resolution and provides numerous applications in variant and protein domain interpretation as well as in the design of in vitro experiments.
Collapse
|
146
|
Dvorak P, Hanicinec V, Soucek P. The position of the longest intron is related to biological functions in some human genes. Front Genet 2023; 13:1085139. [PMID: 36712854 PMCID: PMC9875286 DOI: 10.3389/fgene.2022.1085139] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 12/27/2022] [Indexed: 01/12/2023] Open
Abstract
The evidence that introns can influence different levels of transfer of genetic information between DNA and the final product is increasing. Longer first introns were found to be a general property of eukaryotic gene structure and shown to contain a higher fraction of conserved sequence and different functional elements. Our work brings more precise information about the position of the longest introns in human protein-coding genes and possible connection with biological function and gene expression. According to our results, the position of the longest intron can be localized to the first third of introns in 64%, the second third in 19%, and the third in 17%, with notable peaks at the middle and last introns of approximately 5% and 6%, respectively. The median lengths of the longest introns decrease with increasing distance from the start of the gene from approximately 15,000 to 5,000 bp. We have shown that the position of the longest intron is in some cases linked to the biological function of the given gene. For example, DNA repair genes have the longest intron more often in the second or third. In the distribution of gene expression according to the position of the longest intron, tissue-specific profiles can be traced with the highest expression usually at the absolute positions of intron 1 and 2. In this work, we present arguments supporting the hypothesis that the position of the longest intron in a gene is another biological factor modulating the transmission of genetic information. The position of the longest intron is related to biological functions in some human genes.
Collapse
Affiliation(s)
- Pavel Dvorak
- Department of Biology, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czechia,Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czechia,Institute of Medical Genetics, University Hospital Pilsen, Pilsen, Czechia,*Correspondence: Pavel Dvorak,
| | - Vojtech Hanicinec
- Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czechia
| | - Pavel Soucek
- Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czechia,Toxicogenomics Unit, National Institute of Public Health, Prague, Czechia
| |
Collapse
|
147
|
Sayers EW, Bolton EE, Brister J, Canese K, Chan J, Comeau D, Farrell C, Feldgarden M, Fine AM, Funk K, Hatcher E, Kannan S, Kelly C, Kim S, Klimke W, Landrum M, Lathrop S, Lu Z, Madden T, Malheiro A, Marchler-Bauer A, Murphy T, Phan L, Pujar S, Rangwala S, Schneider V, Tse T, Wang J, Ye J, Trawick B, Pruitt K, Sherry S. Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res 2023; 51:D29-D38. [PMID: 36370100 PMCID: PMC9825438 DOI: 10.1093/nar/gkac1032] [Citation(s) in RCA: 124] [Impact Index Per Article: 124.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/11/2022] [Accepted: 11/09/2022] [Indexed: 11/15/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathi Canese
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jessica Chan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Donald C Comeau
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Michael Feldgarden
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Anna M Fine
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathryn Funk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Eneida Hatcher
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sivakumar Kannan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Christopher Kelly
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - William Klimke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Melissa J Landrum
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stacy Lathrop
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Thomas L Madden
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Adriana Malheiro
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sanjida H Rangwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Tony Tse
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jian Ye
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Barton W Trawick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
148
|
García-Ruiz S, Gustavsson EK, Zhang D, Reynolds RH, Chen Z, Fairbrother-Browne A, Gil-Martínez AL, Botia JA, Collado-Torres L, Ryten M. IntroVerse: a comprehensive database of introns across human tissues. Nucleic Acids Res 2023; 51:D167-D178. [PMID: 36399497 PMCID: PMC9825543 DOI: 10.1093/nar/gkac1056] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/21/2022] [Accepted: 10/30/2022] [Indexed: 11/19/2022] Open
Abstract
Dysregulation of RNA splicing contributes to both rare and complex diseases. RNA-sequencing data from human tissues has shown that this process can be inaccurate, resulting in the presence of novel introns detected at low frequency across samples and within an individual. To enable the full spectrum of intron use to be explored, we have developed IntroVerse, which offers an extensive catalogue on the splicing of 332,571 annotated introns and a linked set of 4,679,474 novel junctions covering 32,669 different genes. This dataset has been generated through the analysis of 17,510 human control RNA samples from 54 tissues provided by the Genotype-Tissue Expression Consortium. IntroVerse has two unique features: (i) it provides a complete catalogue of novel junctions and (ii) each novel junction has been assigned to a specific annotated intron. This unique, hierarchical structure offers multiple uses, including the identification of novel transcripts from known genes and their tissue-specific usage, and the assessment of background splicing noise for introns thought to be mis-spliced in disease states. IntroVerse provides a user-friendly web interface and is freely available at https://rytenlab.com/browser/app/introverse.
Collapse
Affiliation(s)
- Sonia García-Ruiz
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - Emil K Gustavsson
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - David Zhang
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - Regina H Reynolds
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - Zhongbo Chen
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - Aine Fairbrother-Browne
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
- Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King's College London, London, WC2R 2LS, UK
| | - Ana Luisa Gil-Martínez
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
- Department of Information and Communications Engineering Faculty of Informatics, Espinardo Campus, University of Murcia, Murcia, 30100, Spain
| | - Juan A Botia
- Department of Information and Communications Engineering Faculty of Informatics, Espinardo Campus, University of Murcia, Murcia, 30100, Spain
| | | | - Mina Ryten
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, WC1N 1EH, UK
| |
Collapse
|
149
|
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, Craig PA, Crichlow GV, Dalenberg K, Duarte JM, Dutta S, Fayazi M, Feng Z, Flatt JW, Ganesan S, Ghosh S, Goodsell DS, Green RK, Guranovic V, Henry J, Hudson BP, Khokhriakov I, Lawson CL, Liang Y, Lowe R, Peisach E, Persikova I, Piehl DW, Rose Y, Sali A, Segura J, Sekharan M, Shao C, Vallat B, Voigt M, Webb B, Westbrook JD, Whetstone S, Young JY, Zalevsky A, Zardecki C. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res 2023; 51:D488-D508. [PMID: 36420884 PMCID: PMC9825554 DOI: 10.1093/nar/gkac1077] [Citation(s) in RCA: 200] [Impact Index Per Article: 200.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/17/2022] [Accepted: 11/02/2022] [Indexed: 11/27/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a 'living data resource.' Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Henry Chao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Paul A Craig
- School of Chemistry and Materials Science, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Gregg V Crichlow
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Kenneth Dalenberg
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Maryam Fayazi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Justin W Flatt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sai Ganesan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - David S Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Rachel Kramer Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Brian P Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Igor Khokhriakov
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Catherine L Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ben Webb
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Shamara Whetstone
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jasmine Y Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Arthur Zalevsky
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
150
|
Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland J, Mudge J, Sisu C, Wright J, Arnan C, Barnes I, Banerjee A, Bennett R, Berry A, Bignell A, Boix C, Calvet F, Cerdán-Vélez D, Cunningham F, Davidson C, Donaldson S, Dursun C, Fatima R, Giorgetti S, Giron C, Gonzalez J, Hardy M, Harrison P, Hourlier T, Hollis Z, Hunt T, James B, Jiang Y, Johnson R, Kay M, Lagarde J, Martin F, Gómez L, Nair S, Ni P, Pozo F, Ramalingam V, Ruffier M, Schmitt B, Schreiber J, Steed E, Suner MM, Sumathipala D, Sycheva I, Uszczynska-Ratajczak B, Wass E, Yang Y, Yates A, Zafrulla Z, Choudhary J, Gerstein M, Guigo R, Hubbard TJP, Kellis M, Kundaje A, Paten B, Tress M, Flicek P. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res 2023; 51:D942-D949. [PMID: 36420896 PMCID: PMC9825462 DOI: 10.1093/nar/gkac1071] [Citation(s) in RCA: 78] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/15/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sílvia Carbonell-Sala
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Carme Arnan
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Abhimanyu Banerjee
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Boix
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Ferriol Calvet
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cagatay Dursun
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos Garcıa Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin James
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Rory Johnson
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Surag Nair
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Pengyu Ni
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Vivek Ramalingam
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jacob M Schreiber
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Barbara Uszczynska-Ratajczak
- Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahoor Zafrulla
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Roderic Guigo
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|