1
|
Santucci K, Cheng Y, Xu SM, Janitz M. Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches. Brief Funct Genomics 2024; 23:683-694. [PMID: 39158328 DOI: 10.1093/bfgp/elae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/29/2024] [Accepted: 07/31/2024] [Indexed: 08/20/2024] Open
Abstract
Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.
Collapse
Affiliation(s)
- Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
2
|
Margasyuk S, Kuznetsova A, Zavileyskiy L, Vlasenok M, Skvortsov D, Pervouchine D. Human introns contain conserved tissue-specific cryptic poison exons. NAR Genom Bioinform 2024; 6:lqae163. [PMID: 39664813 PMCID: PMC11632617 DOI: 10.1093/nargab/lqae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 10/10/2024] [Accepted: 11/10/2024] [Indexed: 12/13/2024] Open
Abstract
Eukaryotic cells express a large number of transcripts from a single gene due to alternative splicing. Despite hundreds of thousands of splice isoforms being annotated in databases, it has been reported that the current exon catalogs remain incomplete. At the same time, introns of human protein-coding (PC) genes contain a large number of evolutionarily conserved elements with unknown function. Here, we explore the possibility that some of them represent cryptic exons that are expressed in rare conditions. We identified a group of cryptic exons that are similar to the annotated exons in terms of evolutionary conservation and RNA-seq read coverage in the Genotype-Tissue Expression dataset. Most of them were poison, i.e. generated an nonsense-mediated decay (NMD) isoform upon inclusion, and many showed signs of tissue-specific and cancer-specific expression and regulation. We performed RNA-seq in A549 cell line treated with cycloheximide to inactivate NMD and confirmed using quantitative polymerase chain reaction that seven of eight exons tested are, indeed, expressed. This study shows that introns of human PC genes contain cryptic poison exons, which reside in conserved intronic regions and remain not fully annotated due to insufficient representation in RNA-seq libraries.
Collapse
Affiliation(s)
- Sergey Margasyuk
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| | - Antonina Kuznetsova
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| | - Lev Zavileyskiy
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| | - Maria Vlasenok
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| | - Dmitry Skvortsov
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
- Faculty of Chemistry, Moscow State University, Ul Kolmogorova, 1, 119991, Moscow, Russia
| | - Dmitri D Pervouchine
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| |
Collapse
|
3
|
Anczukow O, Allain FHT, Angarola BL, Black DL, Brooks AN, Cheng C, Conesa A, Crosse EI, Eyras E, Guccione E, Lu SX, Neugebauer KM, Sehgal P, Song X, Tothova Z, Valcárcel J, Weeks KM, Yeo GW, Thomas-Tikhonenko A. Steering research on mRNA splicing in cancer towards clinical translation. Nat Rev Cancer 2024; 24:887-905. [PMID: 39384951 DOI: 10.1038/s41568-024-00750-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/11/2024]
Abstract
Splicing factors are affected by recurrent somatic mutations and copy number variations in several types of haematologic and solid malignancies, which is often seen as prima facie evidence that splicing aberrations can drive cancer initiation and progression. However, numerous spliceosome components also 'moonlight' in DNA repair and other cellular processes, making their precise role in cancer difficult to pinpoint. Still, few would deny that dysregulated mRNA splicing is a pervasive feature of most cancers. Correctly interpreting these molecular fingerprints can reveal novel tumour vulnerabilities and untapped therapeutic opportunities. Yet multiple technological challenges, lingering misconceptions, and outstanding questions hinder clinical translation. To start with, the general landscape of splicing aberrations in cancer is not well defined, due to limitations of short-read RNA sequencing not adept at resolving complete mRNA isoforms, as well as the shallow read depth inherent in long-read RNA-sequencing, especially at single-cell level. Although individual cancer-associated isoforms are known to contribute to cancer progression, widespread splicing alterations could be an equally important and, perhaps, more readily actionable feature of human cancers. This is to say that in addition to 'repairing' mis-spliced transcripts, possible therapeutic avenues include exacerbating splicing aberration with small-molecule spliceosome inhibitors, targeting recurrent splicing aberrations with synthetic lethal approaches, and training the immune system to recognize splicing-derived neoantigens.
Collapse
Affiliation(s)
- Olga Anczukow
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| | - Frédéric H-T Allain
- Department of Biology, Eidgenössische Technische Hochschule (ETH), Zürich, Switzerland
| | | | - Douglas L Black
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Angela N Brooks
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Chonghui Cheng
- Department of Molecular and Human Genetics, Lester & Sue Breast Center, Baylor College of Medicine, Houston, TX, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Spain
| | - Edie I Crosse
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Eduardo Eyras
- Shine-Dalgarno Centre for RNA Innovation, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Ernesto Guccione
- Department of Oncological Sciences, Mount Sinai School of Medicine, New York, NY, USA
| | - Sydney X Lu
- Department of Medicine, Stanford Medical School, Palo Alto, CA, USA
| | - Karla M Neugebauer
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA
| | - Priyanka Sehgal
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Xiao Song
- Department of Neurology, Northwestern University, Chicago, IL, USA
| | - Zuzana Tothova
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Juan Valcárcel
- Centre for Genomic Regulation, Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| | - Kevin M Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Andrei Thomas-Tikhonenko
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology & Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
4
|
Karlebach G, Hansen P, Köhler K, Robinson P. IsopretGO-analysing and visualizing the functional consequences of differential splicing. NAR Genom Bioinform 2024; 6:lqae165. [PMID: 39660256 PMCID: PMC11630322 DOI: 10.1093/nargab/lqae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 09/27/2024] [Accepted: 11/11/2024] [Indexed: 12/12/2024] Open
Abstract
Gene Ontology overrepresentation analysis (GO-ORA) is a standard approach towards characterizing salient functional characteristics of sets of differentially expressed genes (DGE) in RNA sequencing (RNA-seq) experiments. GO-ORA compares the distribution of GO annotations of the DGE to that of all genes or all expressed genes. This approach has not been available to characterize differential alternative splicing (DAS). Here, we introduce a desktop application called isopretGO for visualizing the functional implications of DGE and DAS that leverages our previously published machine-learning predictions of GO annotations for individual isoforms. We show based on an analysis of 100 RNA-seq datasets that DAS and DGE frequently have starkly different functional profiles. We present an example that shows how isopretGO can be used to identify functional shifts in RNA-seq data that can be attributed to differential splicing.
Collapse
Affiliation(s)
- Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Peter Hansen
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Kristin Köhler
- Berlin Institute of Health at Charité—Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
- Berlin Institute of Health at Charité—Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
5
|
Tress ML. The rapid degradation of translated upstream regions points to an inefficient translation initiation process. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.25.625198. [PMID: 39651291 PMCID: PMC11623489 DOI: 10.1101/2024.11.25.625198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Large-scale experimental analyses find ever more abundant evidence of translation from start codons upstream of the canonical start site. This translation either generates entirely new proteins (from novel upstream open reading frames) or produces isoforms with extended N-terminals when the novel start codon is in frame Most extended N-terminals are likely to just add a disordered region to the canonical protein isoform, but some may also block the recognition of the signal peptide causing the isoform to accumulate in the incorrect cellular compartment. This analysis finds evidence that upstream translations that would interfere with signal peptides are detected in expected quantities in ribosome profiling experiments, but that the equivalent N-terminally extended protein isoforms are significantly reduced in multiple proteomics experiments. This suggests that these isoforms are likely to be degraded shortly after translation by the ubiquitination pathway, thus preventing the build up of potentially harmful proteins with hydrophobic regions in the cytoplasm. In addition, this is further evidence that most of the transcripts translated from upstream start sites are the result of an inefficient translation initiation process. This has implications for the annotation of proteins given the huge numbers of upstream translations that are being detected in large-scale experiments.
Collapse
|
6
|
Mudge JM, Carbonell-Sala S, Diekhans M, Martinez JG, Hunt T, Jungreis I, Loveland JE, Arnan C, Barnes I, Bennett R, Berry A, Bignell A, Cerdán-Vélez D, Cochran K, Cortés LT, Davidson C, Donaldson S, Dursun C, Fatima R, Hardy M, Hebbar P, Hollis Z, James BT, Jiang Y, Johnson R, Kaur G, Kay M, Mangan RJ, Maquedano M, Gómez LM, Mathlouthi N, Merritt R, Ni P, Palumbo E, Perteghella T, Pozo F, Raj S, Sisu C, Steed E, Sumathipala D, Suner MM, Uszczynska-Ratajczak B, Wass E, Yang YT, Zhang D, Finn RD, Gerstein M, Guigó R, Hubbard TJP, Kellis M, Kundaje A, Paten B, Tress ML, Birney E, Martin FJ, Frankish A. GENCODE 2025: reference gene annotation for human and mouse. Nucleic Acids Res 2024:gkae1078. [PMID: 39565199 DOI: 10.1093/nar/gkae1078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/12/2024] [Accepted: 10/23/2024] [Indexed: 11/21/2024] Open
Abstract
GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcripts and to substantially improve existing models, and our long non-coding RNA catalogs have undergone a dramatic expansion and reconfiguration as a result. Meanwhile, we are incorporating data from state-of-the-art proteomics and Ribo-seq experiments to fine-tune our annotation of translated sequences, while further insights into function can be gained from multi-genome alignments that grow richer as more species' genomes are sequenced. Such methodologies are combined into a fully integrated annotation workflow. However, the increasing complexity of our resources can present usability challenges, and we are resolving these with the creation of filtered genesets such as MANE Select and GENCODE Primary. The next challenge is to propagate annotations throughout multiple human and mouse genomes, as we enter the pangenome era. Our resources are freely available at our web portal www.gencodegenes.org, and via the Ensembl and UCSC genome browsers.
Collapse
Affiliation(s)
- Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003 Catalonia, Spain
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, 2300 Delaware Avenue, University of California, Santa Cruz, CA 95060, USA
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irwin Jungreis
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carme Arnan
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003 Catalonia, Spain
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Kelly Cochran
- Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA, USA
| | - Lucas T Cortés
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cagatay Dursun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Prajna Hebbar
- UC Santa Cruz Genomics Institute, 2300 Delaware Avenue, University of California, Santa Cruz, CA 95060, USA
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin T James
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Rory Johnson
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland
- School of Biology and Environmental Science, University College Dublin,, Belfield, Dublin 4 D04 V1W8, Ireland
| | - Gazaldeep Kaur
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003 Catalonia, Spain
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Riley J Mangan
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
- Genetics Training Program, Harvard Medical School, Boston, MA 02115, USA
| | - Miguel Maquedano
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Nourhen Mathlouthi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ryan Merritt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Emilio Palumbo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003 Catalonia, Spain
| | - Tamara Perteghella
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003 Catalonia, Spain
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Carrer de la Mercè, 12, Ciutat Vella 08002 Barcelona, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Life Sciences, Brunel University London, Kingston Lane, Uxbridge, London UB8 3PH, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Barbara Uszczynska-Ratajczak
- Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego12/14, 61-704 Poznan, Poland
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, 220 Handan Road, Shanghai 200433, China
| | - Dingyao Zhang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003 Catalonia, Spain
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Carrer de la Mercè, 12, Ciutat Vella 08002 Barcelona, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, 2300 Delaware Avenue, University of California, Santa Cruz, CA 95060, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
7
|
Rodriguez JM, Maquedano M, Cerdan-Velez D, Calvo E, Vazquez J, Tress ML. A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.14.623419. [PMID: 39605392 PMCID: PMC11601488 DOI: 10.1101/2024.11.14.623419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
The human genome has been the subject of intense scrutiny by experimental and manual curation projects for more than two decades. Novel coding genes have been proposed from large-scale RNASeq, ribosome profiling and proteomics experiments. Here we carry out an in-depth analysis of an entire proteomics database. We analysed the proteins, peptides and spectra housed in the human build of the PeptideAtlas proteomics database to identify coding regions that are not yet annotated in the GENCODE reference gene set. We find support for hundreds of missing alternative protein isoforms and unannotated upstream translations, and evidence of cross-contamination from other species. There was reliable peptide evidence for 34 novel unannotated open reading frames (ORFs) in PeptideAtlas. We find that almost half belong to coding genes that are missing from GENCODE and other reference sets. Most of the remaining ORFs were not conserved beyond human, however, and their peptide confirmation was restricted to cancer cell lines. We show that this is strong evidence for aberrant translation, raising important questions about the extent of aberrant translation and how these ORFs should be annotated in reference genomes.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Miguel Maquedano
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Daniel Cerdan-Velez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Enrique Calvo
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| |
Collapse
|
8
|
Ibeh N, Kusuma P, Crenna Darusallam C, Malik SG, Sudoyo H, McCarthy DJ, Gallego Romero I. Profiling genetically driven alternative splicing across the Indonesian archipelago. Am J Hum Genet 2024; 111:2458-2477. [PMID: 39383868 PMCID: PMC11568790 DOI: 10.1016/j.ajhg.2024.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 09/11/2024] [Accepted: 09/12/2024] [Indexed: 10/11/2024] Open
Abstract
One of the regulatory mechanisms influencing the functional capacity of genes is alternative splicing (AS). Previous studies exploring the splicing landscape of human tissues have shown that AS has contributed to human biology, especially in disease progression and the immune response. Nonetheless, this phenomenon remains poorly characterized across human populations, and it is unclear how genetic and environmental variation contribute to AS. Here, we examine a set of 115 Indonesian samples from three traditional island populations spanning the genetic ancestry cline that characterizes Island Southeast Asia. We conduct a global AS analysis between islands to ascertain the degree of functionally significant AS events and their consequences. Using an event-based statistical model, we detected over 1,500 significant differential AS events across all comparisons. Additionally, we identify over 6,000 genetic variants associated with changes in splicing (splicing quantitative trait loci [sQTLs]), some of which are driven by Papuan-like genetic ancestry, and only show partial overlap with other publicly available sQTL datasets derived from other populations. Computational predictions of RNA binding activity reveal that a fraction of these sQTLs directly modulate the binding propensity of proteins involved in the splicing regulation of immune genes. Overall, these results contribute toward elucidating the role of genetic variation in shaping gene regulation in one of the most diverse regions in the world.
Collapse
Affiliation(s)
- Neke Ibeh
- School of BioSciences, University of Melbourne, Parkville, VIC 3010, Australia; Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC 3010, Australia; Bioinformatics and Cellular Genomics, St Vincents Institute of Medical Research, Fitzroy, VIC 3065, Australia; Human Genomics and Evolution, St Vincent's Institute of Medical Research, Fitzroy, VIC 3065, Australia
| | - Pradiptajati Kusuma
- Genome Diversity and Disease Laboratory, Mochtar Riady Institute of Nanotechnology, Tangerang 15811, Indonesia
| | - Chelzie Crenna Darusallam
- Genome Diversity and Disease Laboratory, Mochtar Riady Institute of Nanotechnology, Tangerang 15811, Indonesia
| | - Safarina G Malik
- Genome Diversity and Disease Laboratory, Mochtar Riady Institute of Nanotechnology, Tangerang 15811, Indonesia
| | - Herawati Sudoyo
- Genome Diversity and Disease Laboratory, Mochtar Riady Institute of Nanotechnology, Tangerang 15811, Indonesia
| | - Davis J McCarthy
- Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC 3010, Australia; Bioinformatics and Cellular Genomics, St Vincents Institute of Medical Research, Fitzroy, VIC 3065, Australia; School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Parkville, VIC 3010, Australia
| | - Irene Gallego Romero
- School of BioSciences, University of Melbourne, Parkville, VIC 3010, Australia; Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC 3010, Australia; Human Genomics and Evolution, St Vincent's Institute of Medical Research, Fitzroy, VIC 3065, Australia; Centre for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia.
| |
Collapse
|
9
|
Zhang Z, Wayment-Steele HK, Brixi G, Wang H, Kern D, Ovchinnikov S. Protein language models learn evolutionary statistics of interacting sequence motifs. Proc Natl Acad Sci U S A 2024; 121:e2406285121. [PMID: 39467119 PMCID: PMC11551344 DOI: 10.1073/pnas.2406285121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 09/03/2024] [Indexed: 10/30/2024] Open
Abstract
Protein language models (pLMs) have emerged as potent tools for predicting and designing protein structure and function, and the degree to which these models fundamentally understand the inherent biophysics of protein structure stands as an open question. Motivated by a finding that pLM-based structure predictors erroneously predict nonphysical structures for protein isoforms, we investigated the nature of sequence context needed for contact predictions in the pLM Evolutionary Scale Modeling (ESM-2). We demonstrate by use of a "categorical Jacobian" calculation that ESM-2 stores statistics of coevolving residues, analogously to simpler modeling approaches like Markov Random Fields and Multivariate Gaussian models. We further investigated how ESM-2 "stores" information needed to predict contacts by comparing sequence masking strategies, and found that providing local windows of sequence information allowed ESM-2 to best recover predicted contacts. This suggests that pLMs predict contacts by storing motifs of pairwise contacts. Our investigation highlights the limitations of current pLMs and underscores the importance of understanding the underlying mechanisms of these models.
Collapse
Affiliation(s)
- Zhidian Zhang
- Harvard University, Cambridge, MA02138
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA02139
- Institute of Bioengineering, School of Life Sciences, Ecole polytechnique fédérale de Lausanne, LausanneVD 1015, Switzerland
| | - Hannah K. Wayment-Steele
- HHMI, Brandeis University, Waltham, MA02453
- Department of Biochemistry, Brandeis University, Waltham, MA02453
| | - Garyk Brixi
- Harvard College, Harvard University, Cambridge, MA02138
| | | | - Dorothee Kern
- HHMI, Brandeis University, Waltham, MA02453
- Department of Biochemistry, Brandeis University, Waltham, MA02453
| | - Sergey Ovchinnikov
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA02139
- John Harvard Distinguished Science Fellowship, Harvard University, Cambridge, MA02138
| |
Collapse
|
10
|
Abdelmohsen K, Mazan‐Mamczarz K, Munk R, Tsitsipatis D, Meng Q, Rossi M, Pal A, Shin CH, Martindale JL, Piao Y, Fan J, Yanai H, De S, Beerman I, Gorospe M. Identification of senescent cell subpopulations by CITE-seq analysis. Aging Cell 2024; 23:e14297. [PMID: 39143693 PMCID: PMC11561699 DOI: 10.1111/acel.14297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 06/30/2024] [Accepted: 07/23/2024] [Indexed: 08/16/2024] Open
Abstract
Cellular senescence, a state of persistent growth arrest, is closely associated with aging and age-related diseases. Deciphering the heterogeneity within senescent cell populations and identifying therapeutic targets are paramount for mitigating senescence-associated pathologies. In this study, proteins on the surface of cells rendered senescent by replicative exhaustion and by exposure to ionizing radiation (IR) were identified using mass spectrometry analysis, and a subset of them was further studied using single-cell CITE-seq (Cellular Indexing of Transcriptomes and Epitopes by Sequencing) analysis. Based on the presence of proteins on the cell surface, we identified two distinct IR-induced senescent cell populations: one characterized by high levels of CD109 and CD112 (cluster 3), the other characterized by high levels of CD112, CD26, CD73, HLA-ABC, CD54, CD49A, and CD44 (cluster 0). We further found that cluster 0 represented proliferating and senescent cells in the G1 phase of the division cycle, and CITE-seq detection of cell surface proteins selectively discerned those in the senescence group. Our study highlights the heterogeneity of senescent cells and underscores the value of cell surface proteins as tools for distinguishing senescent cell programs and subclasses, paving the way for targeted therapeutic strategies in disorders exacerbated by senescence.
Collapse
Affiliation(s)
- Kotb Abdelmohsen
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | | | - Rachel Munk
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Dimitrios Tsitsipatis
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Qiong Meng
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Martina Rossi
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Apala Pal
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Chang Hoon Shin
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Jennifer L. Martindale
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Yulan Piao
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Jinshui Fan
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Hagai Yanai
- Translational Gerontology BranchNational Institute on Aging (NIA) Intramural Research Program (IRP), National Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Supriyo De
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Isabel Beerman
- Translational Gerontology BranchNational Institute on Aging (NIA) Intramural Research Program (IRP), National Institutes of Health (NIH)BaltimoreMarylandUSA
| | - Myriam Gorospe
- Laboratory of Genetics and GenomicsNational Institutes of Health (NIH)BaltimoreMarylandUSA
| |
Collapse
|
11
|
Kiseleva OI, Arzumanian VA, Kurbatov IY, Poverennaya EV. In silico and in cellulo approaches for functional annotation of human protein splice variants. BIOMEDITSINSKAIA KHIMIIA 2024; 70:315-328. [PMID: 39324196 DOI: 10.18097/pbmc20247005315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
The elegance of pre-mRNA splicing mechanisms continues to interest scientists even after over a half century, since the discovery of the fact that coding regions in genes are interrupted by non-coding sequences. The vast majority of human genes have several mRNA variants, coding structurally and functionally different protein isoforms in a tissue-specific manner and with a linkage to specific developmental stages of the organism. Alteration of splicing patterns shifts the balance of functionally distinct proteins in living systems, distorts normal molecular pathways, and may trigger the onset and progression of various pathologies. Over the past two decades, numerous studies have been conducted in various life sciences disciplines to deepen our understanding of splicing mechanisms and the extent of their impact on the functioning of living systems. This review aims to summarize experimental and computational approaches used to elucidate the functions of splice variants of a single gene based on our experience accumulated in the laboratory of interactomics of proteoforms at the Institute of Biomedical Chemistry (IBMC) and best global practices.
Collapse
Affiliation(s)
- O I Kiseleva
- Institute of Biomedical Chemistry, Moscow, Russia
| | | | | | | |
Collapse
|
12
|
Pyatnitskiy MA, Poverennaya EV. Transcript-Level Biomarkers of Early Lung Carcinogenesis in Bronchial Lesions. Cancers (Basel) 2024; 16:2260. [PMID: 38927965 PMCID: PMC11202239 DOI: 10.3390/cancers16122260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 06/09/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
Premalignant lesions within the bronchial epithelium signify the initial phases of squamous cell lung carcinoma, posing challenges for detection via conventional methods. Instead of focusing solely on gene expression, in this study, we explore transcriptomic alterations linked to lesion progression, with an emphasis on protein-coding transcripts. We reanalyzed a publicly available RNA-Seq dataset on airway epithelial cells from 82 smokers with and without premalignant lesions. Transcript and gene abundance were quantified using kallisto, while differential expression and transcript usage analysis was performed utilizing sleuth and RATs packages. Functional characterization involved overrepresentation analysis via clusterProfiler, weighted coexpression network analysis (WGCNA), and network analysis via Enrichr-KG. We detected 5906 differentially expressed transcripts and 4626 genes, exhibiting significant enrichment within pathways associated with oxidative phosphorylation and mitochondrial function. Remarkably, transcript-level WGCNA revealed a single module correlated with dysplasia status, notably enriched in cilium-related biological processes. Notable hub transcripts included RABL2B (ENST00000395590), DNAH1 (ENST00000420323), EFHC1 (ENST00000635996), and VWA3A (ENST00000563389) along with transcription factors such as FOXJ1 and ZNF474 as potential regulators. Our findings underscore the value of transcript-level analysis in uncovering novel insights into premalignant bronchial lesion biology, including identification of potential biomarkers associated with early lung carcinogenesis.
Collapse
Affiliation(s)
- Mikhail A. Pyatnitskiy
- Institute of Biomedical Chemistry, Moscow 119121, Russia;
- National Research University Higher School of Economics, Moscow 101000, Russia
| | | |
Collapse
|
13
|
Hazell Pickering S, Abdelhalim M, Collas P, Briand N. Alternative isoform expression of key thermogenic genes in human beige adipocytes. Front Endocrinol (Lausanne) 2024; 15:1395750. [PMID: 38859907 PMCID: PMC11163967 DOI: 10.3389/fendo.2024.1395750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 05/07/2024] [Indexed: 06/12/2024] Open
Abstract
Background The beneficial effect of thermogenic adipocytes in maintaining body weight and protecting against metabolic disorders has raised interest in understanding the regulatory mechanisms defining white and beige adipocyte identity. Although alternative splicing has been shown to propagate adipose browning signals in mice, this has yet to be thoroughly investigated in human adipocytes. Methods We performed parallel white and beige adipogenic differentiation using primary adipose stem cells from 6 unrelated healthy subjects and assessed differential gene and isoform expression in mature adipocytes by RNA sequencing. Results We find 777 exon junctions with robust differential usage between white and beige adipocytes in all 6 subjects, mapping to 562 genes. Importantly, only 10% of these differentially spliced genes are also differentially expressed, indicating that alternative splicing constitutes an additional layer of gene expression regulation during beige adipocyte differentiation. Functional classification of alternative isoforms points to a gain of function for key thermogenic transcription factors such as PPARG and CITED1, and enzymes such as PEMT, or LPIN1. We find that a large majority of the splice variants arise from differential TSS usage, with beige-specific TSSs being enriched for PPARγ and MED1 binding compared to white-specific TSSs. Finally, we validate beige specific isoform expression at the protein level for two thermogenic regulators, PPARγ and PEMT. Discussion These results suggest that differential isoform expression through alternative TSS usage is an important regulatory mechanism for human adipocyte thermogenic specification.
Collapse
Affiliation(s)
- Sarah Hazell Pickering
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Mohamed Abdelhalim
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Philippe Collas
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
- Department of Immunology and Transfusion Medicine, Oslo University Hospital, Oslo, Norway
| | - Nolwenn Briand
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
14
|
Qin Z, Yang J, Zhang K, Gao X, Ran Q, Xu Y, Wang Z, Lou D, Huang C, Zellmer L, Meng G, Chen N, Ma H, Wang Z, Liao DJ. Updating mRNA variants of the human RSK4 gene and their expression in different stressed situations. Heliyon 2024; 10:e27475. [PMID: 38560189 PMCID: PMC10980951 DOI: 10.1016/j.heliyon.2024.e27475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Revised: 02/11/2024] [Accepted: 02/29/2024] [Indexed: 04/04/2024] Open
Abstract
We determined RNA spectrum of the human RSK4 (hRSK4) gene (also called RPS6KA6) and identified 29 novel mRNA variants derived from alternative splicing, which, plus the NCBI-documented ones and the five we reported previously, totaled 50 hRSK4 RNAs that, by our bioinformatics analyses, encode 35 hRSK4 protein isoforms of 35-762 amino acids. Many of the mRNAs are bicistronic or tricistronic for hRSK4. The NCBI-normalized NM_014496.5 and the protein it encodes are designated herein as the Wt-1 mRNA and protein, respectively, whereas the NM_001330512.1 and the long protein it encodes are designated as the Wt-2 mRNA and protein, respectively. Many of the mRNA variants responded differently to different situations of stress, including serum starvation, a febrile temperature, treatment with ethanol or ethanol-extracted clove buds (an herbal medicine), whereas the same stressed situation often caused quite different alterations among different mRNA variants in different cell lines. Mosifloxacin, an antibiotics and also a functional inhibitor of hRSK4, could inhibit the expression of certain hRSK4 mRNA variants. The hRSK4 gene likely uses alternative splicing as a handy tool to adapt to different stressed situations, and the mRNA and protein multiplicities may partly explain the incongruous literature on its expression and comports.
Collapse
Affiliation(s)
- Zhenwei Qin
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Jianglin Yang
- Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Rd, Guiyang, 550004, Guizhou Province, China
- Key Lab of Endemic and Ethnic Diseases of the Ministry of Education of China in Guizhou Medical University, Guiyang, 550004, Guizhou Province, China
| | - Keyin Zhang
- Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Xia Gao
- Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Qianchuan Ran
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Yuanhong Xu
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Zhi Wang
- Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Didong Lou
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Chunhua Huang
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Lucas Zellmer
- Department of Medicine, Hennepin County Medical Center, 730 South 8th St., Minneapolis, MN, 55415, USA
| | - Guangxue Meng
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Na Chen
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Hong Ma
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Zhe Wang
- State Key Laboratory of Cancer Biology, Department of Pathology, Xijing Hospital, Air Force Medical University, 169 Changle West Road, Xi'an, 710032, China
| | - Dezhong Joshua Liao
- Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Rd, Guiyang, 550004, Guizhou Province, China
- Key Lab of Endemic and Ethnic Diseases of the Ministry of Education of China in Guizhou Medical University, Guiyang, 550004, Guizhou Province, China
| |
Collapse
|
15
|
Harrison PW, Amode MR, Austine-Orimoloye O, Azov A, Barba M, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Boddu S, Branco Lins PR, Brooks L, Ramaraju S, Campbell L, Martinez MC, Charkhchi M, Chougule K, Cockburn A, Davidson C, De Silva N, Dodiya K, Donaldson S, El Houdaigui B, Naboulsi T, Fatima R, Giron CG, Genez T, Grigoriadis D, Ghattaoraya G, Martinez JG, Gurbich T, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Lodha D, Marques-Coelho D, Maslen G, Merino G, Mirabueno L, Mushtaq A, Hossain S, Ogeh D, Sakthivel MP, Parker A, Perry M, Piližota I, Poppleton D, Prosovetskaia I, Raj S, Pérez-Silva J, Salam A, Saraf S, Saraiva-Agostinho N, Sheppard D, Sinha S, Sipos B, Sitnik V, Stark W, Steed E, Suner MM, Surapaneni L, Sutinen K, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Ware D, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley G, Keatley J, Loveland J, Moore B, Mudge J, Naamati G, Tate J, Trevanion S, Winterbottom A, Frankish A, Hunt SE, Cunningham F, Dyer S, Finn R, Martin F, Yates A. Ensembl 2024. Nucleic Acids Res 2024; 52:D891-D899. [PMID: 37953337 PMCID: PMC10767893 DOI: 10.1093/nar/gkad1049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 11/14/2023] Open
Abstract
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Collapse
Affiliation(s)
- Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthieu Barba
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Arne Becker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Simarpreet Kaur Bhurji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Paulo R Branco Lins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shashank Budhanuru Ramaraju
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lahcen I Campbell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manuel Carbajo Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
| | - Alexander Cockburn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nishadi H De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dionysios Grigoriadis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gurpreet S Ghattaoraya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tatiana A Gurbich
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vinay Kaykala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Disha Lodha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diego Marques-Coelho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gabriela Alejandra Merino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Louisse Paola Mirabueno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Syed Nakib Hossain
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Denye N Ogeh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manoj Pandian Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Daniel Poppleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shradha Saraf
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Swati Sinha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Botond Sipos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vasily Sitnik
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - William Stark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Natalie L Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jon Keatley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
16
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
17
|
Carrion SA, Michal JJ, Jiang Z. Alternative Transcripts Diversify Genome Function for Phenome Relevance to Health and Diseases. Genes (Basel) 2023; 14:2051. [PMID: 38002994 PMCID: PMC10671453 DOI: 10.3390/genes14112051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open
Abstract
Manipulation using alternative exon splicing (AES), alternative transcription start (ATS), and alternative polyadenylation (APA) sites are key to transcript diversity underlying health and disease. All three are pervasive in organisms, present in at least 50% of human protein-coding genes. In fact, ATS and APA site use has the highest impact on protein identity, with their ability to alter which first and last exons are utilized as well as impacting stability and translation efficiency. These RNA variants have been shown to be highly specific, both in tissue type and stage, with demonstrated importance to cell proliferation, differentiation and the transition from fetal to adult cells. While alternative exon splicing has a limited effect on protein identity, its ubiquity highlights the importance of these minor alterations, which can alter other features such as localization. The three processes are also highly interwoven, with overlapping, complementary, and competing factors, RNA polymerase II and its CTD (C-terminal domain) chief among them. Their role in development means dysregulation leads to a wide variety of disorders and cancers, with some forms of disease disproportionately affected by specific mechanisms (AES, ATS, or APA). Challenges associated with the genome-wide profiling of RNA variants and their potential solutions are also discussed in this review.
Collapse
Affiliation(s)
| | | | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA 99164-7620, USA; (S.A.C.); (J.J.M.)
| |
Collapse
|
18
|
Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland J, Mudge J, Sisu C, Wright J, Arnan C, Barnes I, Banerjee A, Bennett R, Berry A, Bignell A, Boix C, Calvet F, Cerdán-Vélez D, Cunningham F, Davidson C, Donaldson S, Dursun C, Fatima R, Giorgetti S, Giron C, Gonzalez J, Hardy M, Harrison P, Hourlier T, Hollis Z, Hunt T, James B, Jiang Y, Johnson R, Kay M, Lagarde J, Martin F, Gómez L, Nair S, Ni P, Pozo F, Ramalingam V, Ruffier M, Schmitt B, Schreiber J, Steed E, Suner MM, Sumathipala D, Sycheva I, Uszczynska-Ratajczak B, Wass E, Yang Y, Yates A, Zafrulla Z, Choudhary J, Gerstein M, Guigo R, Hubbard TJP, Kellis M, Kundaje A, Paten B, Tress M, Flicek P. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res 2023; 51:D942-D949. [PMID: 36420896 PMCID: PMC9825462 DOI: 10.1093/nar/gkac1071] [Citation(s) in RCA: 125] [Impact Index Per Article: 125.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/15/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sílvia Carbonell-Sala
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Carme Arnan
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Abhimanyu Banerjee
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Boix
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Ferriol Calvet
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cagatay Dursun
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos Garcıa Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin James
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Rory Johnson
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Surag Nair
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Pengyu Ni
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Vivek Ramalingam
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jacob M Schreiber
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Barbara Uszczynska-Ratajczak
- Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahoor Zafrulla
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Roderic Guigo
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
19
|
Clinical variant interpretation and biologically relevant reference transcripts. NPJ Genom Med 2022; 7:59. [PMID: 36257961 PMCID: PMC9579139 DOI: 10.1038/s41525-022-00329-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/29/2022] [Indexed: 12/03/2022] Open
Abstract
Clinical variant interpretation is highly dependent on the choice of reference transcript. Although the longest transcript has traditionally been chosen as the reference, APPRIS principal and MANE Select transcripts, biologically supported reference sequences, are now available. In this study, we show that MANE Select and APPRIS principal transcripts are the best reference transcripts for clinical variation. APPRIS principal and MANE Select transcripts capture almost all ClinVar pathogenic variants, and they are particularly powerful over the 94% of coding genes in which they agree. We find that a vanishingly small number of ClinVar pathogenic variants affect alternative protein products. Alternative isoforms that are likely to be clinically relevant can be predicted using TRIFID scores, the highest scoring alternative transcripts are almost 700 times more likely to house pathogenic variants. We believe that APPRIS, MANE and TRIFID are essential tools for clinical variant interpretation.
Collapse
|
20
|
Pozo F, Rodriguez JM, Martínez Gómez L, Vázquez J, Tress ML. APPRIS principal isoforms and MANE Select transcripts define reference splice variants. Bioinformatics 2022; 38:ii89-ii94. [PMID: 36124785 PMCID: PMC9486585 DOI: 10.1093/bioinformatics/btac473] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Selecting the splice variant that best represents a coding gene is a crucial first step in many experimental analyses, and vital for mapping clinically relevant variants. This study compares the longest isoforms, MANE Select transcripts, APPRIS principal isoforms, and expression data, and aims to determine which method is best for selecting biological important reference splice variants for large-scale analyses. RESULTS Proteomics analyses and human genetic variation data suggest that most coding genes have a single main protein isoform. We show that APPRIS principal isoforms and MANE Select transcripts best describe these main cellular isoforms, and find that using the longest splice variant as the representative is a poor strategy. Exons unique to the longest splice isoforms are not under selective pressure, and so are unlikely to be functionally relevant. Expression data are also a poor means of selecting the main splice variant. APPRIS principal and MANE Select exons are under purifying selection, while exons specific to alternative transcripts are not. There are MANE and APPRIS representatives for almost 95% of genes, and where they agree they are particularly effective, coinciding with the main proteomics isoform for over 98.2% of genes. AVAILABILITY AND IMPLEMENTATION APPRIS principal isoforms for human, mouse and other model species can be downloaded from the APPRIS database (https://appris.bioinfo.cnio.es), GENCODE genes (https://www.gencodegenes.org/) and the Ensembl website (https://www.ensembl.org). MANE Select transcripts for the human reference set are available from the Ensembl, GENCODE and RefSeq databases (https://www.ncbi.nlm.nih.gov/refseq/). Lists of splice variants where MANE and APPRIS coincide are available from the APPRIS database. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain,CIBER de Investigaciones Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | | |
Collapse
|
21
|
Wright CJ, Smith CWJ, Jiggins CD. Alternative splicing as a source of phenotypic diversity. Nat Rev Genet 2022; 23:697-710. [PMID: 35821097 DOI: 10.1038/s41576-022-00514-4] [Citation(s) in RCA: 148] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/13/2022] [Indexed: 12/27/2022]
Abstract
A major goal of evolutionary genetics is to understand the genetic processes that give rise to phenotypic diversity in multicellular organisms. Alternative splicing generates multiple transcripts from a single gene, enriching the diversity of proteins and phenotypic traits. It is well established that alternative splicing contributes to key innovations over long evolutionary timescales, such as brain development in bilaterians. However, recent developments in long-read sequencing and the generation of high-quality genome assemblies for diverse organisms has facilitated comparisons of splicing profiles between closely related species, providing insights into how alternative splicing evolves over shorter timescales. Although most splicing variants are probably non-functional, alternative splicing is nonetheless emerging as a dynamic, evolutionarily labile process that can facilitate adaptation and contribute to species divergence.
Collapse
Affiliation(s)
- Charlotte J Wright
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK. .,Department of Zoology, University of Cambridge, Cambridge, UK.
| | | | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
22
|
Rodriguez JM, Pozo F, Cerdán-Vélez D, Di Domenico T, Vázquez J, Tress M. APPRIS: selecting functionally important isoforms. Nucleic Acids Res 2022; 50:D54-D59. [PMID: 34755885 PMCID: PMC8728124 DOI: 10.1093/nar/gkab1058] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/14/2021] [Accepted: 10/20/2021] [Indexed: 12/20/2022] Open
Abstract
APPRIS (https://appris.bioinfo.cnio.es) is a well-established database housing annotations for protein isoforms for a range of species. APPRIS selects principal isoforms based on protein structure and function features and on cross-species conservation. Most coding genes produce a single main protein isoform and the principal isoforms chosen by the APPRIS database best represent this main cellular isoform. Human genetic data, experimental protein evidence and the distribution of clinical variants all support the relevance of APPRIS principal isoforms. APPRIS annotations and principal isoforms have now been expanded to 10 model organisms. In this paper we highlight the most recent updates to the database. APPRIS annotations have been generated for two new species, cow and chicken, the protein structural information has been augmented with reliable models from the EMBL-EBI AlphaFold database, and we have substantially expanded the confirmatory proteomics evidence available for the human genome. The most significant change in APPRIS has been the implementation of TRIFID functional isoform scores. TRIFID functional scores are assigned to all splice isoforms, and APPRIS uses the TRIFID functional scores and proteomics evidence to determine principal isoforms when core methods cannot.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Institute, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Institute, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Tomás Di Domenico
- Bioinformatics Institute, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Michael L Tress
- Bioinformatics Institute, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| |
Collapse
|
23
|
Martinez Gomez L, Pozo F, Walsh TA, Abascal F, Tress ML. The clinical importance of tandem exon duplication-derived substitutions. Nucleic Acids Res 2021; 49:8232-8246. [PMID: 34302486 PMCID: PMC8373072 DOI: 10.1093/nar/gkab623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/21/2021] [Indexed: 01/04/2023] Open
Abstract
Most coding genes in the human genome are annotated with multiple alternative transcripts. However, clear evidence for the functional relevance of the protein isoforms produced by these alternative transcripts is often hard to find. Alternative isoforms generated from tandem exon duplication-derived substitutions are an exception. These splice events are rare, but have important functional consequences. Here, we have catalogued the 236 tandem exon duplication-derived substitutions annotated in the GENCODE human reference set. We find that more than 90% of the events have a last common ancestor in teleost fish, so are at least 425 million years old, and twenty-one can be traced back to the Bilateria clade. Alternative isoforms generated from tandem exon duplication-derived substitutions also have significantly more clinical impact than other alternative isoforms. Tandem exon duplication-derived substitutions have >25 times as many pathogenic and likely pathogenic mutations as other alternative events. Tandem exon duplication-derived substitutions appear to have vital functional roles in the cell and may have played a prominent part in metazoan evolution.
Collapse
Affiliation(s)
- Laura Martinez Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain.,Eukaryotic Annotation Team, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| |
Collapse
|