1
|
Ungar RA, Goddard PC, Jensen TD, Degalez F, Smith KS, Jin CA, Bonner DE, Bernstein JA, Wheeler MT, Montgomery SB. Impact of genome build on RNA-seq interpretation and diagnostics. Am J Hum Genet 2024:S0002-9297(24)00168-X. [PMID: 38834072 DOI: 10.1016/j.ajhg.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 05/04/2024] [Accepted: 05/06/2024] [Indexed: 06/06/2024] Open
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network and Genomics Research to Elucidate the Genetics of Rare Disease Consortium. Across six routinely collected biospecimens, 61% of quantified genes were not influenced by genome build. However, we identified 1,492 genes with build-dependent quantification, 3,377 genes with build-exclusive expression, and 9,077 genes with annotation-specific expression across six routinely collected biospecimens, including 566 clinically relevant and 512 known OMIM genes. Further, we demonstrate that between builds for a given gene, a larger difference in quantification is well correlated with a larger change in expression outlier calling. Combined, we provide a database of genes impacted by build choice and recommend that transcriptomics-guided analyses and diagnoses are cross referenced with these data for robustness.
Collapse
Affiliation(s)
- Rachel A Ungar
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Pagé C Goddard
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Tanner D Jensen
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | | | - Kevin S Smith
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Christopher A Jin
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Devon E Bonner
- Department of Pediatrics, School of Medicine, Stanford University, Stanford, CA, USA; Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA
| | - Jonathan A Bernstein
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA
| | - Matthew T Wheeler
- Department of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
| | - Stephen B Montgomery
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA; Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
2
|
Sunila BG, Dhanushkumar T, Dasegowda KR, Vasudevan K, Rambabu M. Unraveling the molecular landscape of Ataxia Telangiectasia: Insights into Neuroinflammation, immune dysfunction, and potential therapeutic target. Neurosci Lett 2024; 828:137764. [PMID: 38582325 DOI: 10.1016/j.neulet.2024.137764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 03/23/2024] [Accepted: 04/03/2024] [Indexed: 04/08/2024]
Abstract
BACKGROUND Ataxia Telangiectasia (AT) is a genetic disorder characterized by compromised DNA repair, cerebellar degeneration, and immune dysfunction. Understanding the molecular mechanisms driving AT pathology is crucial for developing targeted therapies. METHODS In this study, we conducted a comprehensive analysis to elucidate the molecular mechanisms underlying AT pathology. Using publicly available RNA-seq datasets comparing control and AT samples, we employed in silico transcriptomics to identify potential genes and pathways. We performed differential gene expression analysis with DESeq2 to reveal dysregulated genes associated with AT. Additionally, we constructed a Protein-Protein Interaction (PPI) network to explore the interactions between proteins implicated in AT. RESULTS The network analysis identified hub genes, including TYROBP and PCP2, crucial in immune regulation and cerebellar function, respectively. Furthermore, pathway enrichment analysis unveiled dysregulated pathways linked to AT pathology, providing insights into disease progression. CONCLUSION Our integrated approach offers a holistic understanding of the complex molecular landscape of AT and identifies potential targets for therapeutic intervention. By combining transcriptomic analysis with network-based methods, we provide valuable insights into the underlying mechanisms of AT pathogenesis.
Collapse
Affiliation(s)
- B G Sunila
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - T Dhanushkumar
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - K R Dasegowda
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - Karthick Vasudevan
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - Majji Rambabu
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India.
| |
Collapse
|
3
|
Nanni A, Titus-McQuillan J, Bankole KS, Pardo-Palacios F, Signor S, Vlaho S, Moskalenko O, Morse A, Rogers RL, Conesa A, McIntyre LM. Nucleotide-level distance metrics to quantify alternative splicing implemented in TranD. Nucleic Acids Res 2024; 52:e28. [PMID: 38340337 PMCID: PMC10954468 DOI: 10.1093/nar/gkae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/29/2023] [Accepted: 01/18/2024] [Indexed: 02/12/2024] Open
Abstract
Advances in affordable transcriptome sequencing combined with better exon and gene prediction has motivated many to compare transcription across the tree of life. We develop a mathematical framework to calculate complexity and compare transcript models. Structural features, i.e. intron retention (IR), donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs, are compared and the distance between transcript models is calculated with nucleotide level precision. All metrics are implemented in a PyPi package, TranD and output can be used to summarize splicing patterns for a transcriptome (1GTF) and between transcriptomes (2GTF). TranD output enables quantitative comparisons between: annotations augmented by empirical RNA-seq data and the original transcript models; transcript model prediction tools for longread RNA-seq (e.g. FLAIR versus Isoseq3); alternate annotations for a species (e.g. RefSeq vs Ensembl); and between closely related species. In C. elegans, Z. mays, D. melanogaster, D. simulans and H. sapiens, alternative exons were observed more frequently in combination with an alternative donor/acceptor than alone. Transcript models in RefSeq and Ensembl are linked and both have unique transcript models with empirical support. D. melanogaster and D. simulans, share many transcript models and long-read RNAseq data suggests that both species are under-annotated. We recommend combined references.
Collapse
Affiliation(s)
- Adalena Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - James Titus-McQuillan
- University of North Carolina at Charlotte Department of Bioinformatics and Genomics Charlotte, NC, USA
| | - Kinfeosioluwa S Bankole
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | | | - Sarah Signor
- Department of Biological Sciences, North Dakota State University, Fargo, ND, USA
| | - Srna Vlaho
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Oleksandr Moskalenko
- University of Florida Research Computing, University of Florida, Gainesville, FL 32611, USA
| | - Alison M Morse
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Rebekah L Rogers
- University of North Carolina at Charlotte Department of Bioinformatics and Genomics Charlotte, NC, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology. Spanish National Research Council, Paterna, Spain
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
4
|
Napoli M, Deshpande AA, Chakravarti D, Rajapakshe K, Gunaratne PH, Coarfa C, Flores ER. Genome-wide p63-Target Gene Analyses Reveal TAp63/NRF2-Dependent Oxidative Stress Responses. CANCER RESEARCH COMMUNICATIONS 2024; 4:264-278. [PMID: 38165157 PMCID: PMC10832605 DOI: 10.1158/2767-9764.crc-23-0358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 11/14/2023] [Accepted: 12/27/2023] [Indexed: 01/03/2024]
Abstract
The p53 family member TP63 encodes two sets of N-terminal isoforms, TAp63 and ΔNp63 isoforms. They each regulate diverse biological functions in epidermal morphogenesis and in cancer. In the skin, where their activities have been extensively characterized, TAp63 prevents premature aging by regulating the quiescence and genomic stability of stem cells required for wound healing and hair regeneration, while ΔNp63 controls maintenance and terminal differentiation of epidermal basal cells. This functional diversity is surprising given that these isoforms share a high degree of similarity, including an identical sequence for a DNA-binding domain. To understand the mechanisms of the transcriptional programs regulated by each p63 isoform and leading to diverse biological functions, we performed genome-wide analyses using p63 isoform-specific chromatin immunoprecipitation, RNA sequencing, and metabolomics of TAp63-/- and ΔNp63-/- mouse epidermal cells. Our data indicate that TAp63 and ΔNp63 physically and functionally interact with distinct transcription factors for the downstream regulation of their target genes, thus ultimately leading to the regulation of unique transcriptional programs and biological processes. Our findings unveil novel transcriptomes regulated by the p63 isoforms to control diverse biological functions, including the cooperation between TAp63 and NRF2 in the modulation of metabolic pathways and response to oxidative stress providing a mechanistic explanation for the TAp63 knock out phenotypes. SIGNIFICANCE The p63 isoforms, TAp63 and ΔNp63, control epithelial morphogenesis and tumorigenesis through the interaction with distinct transcription factors and the subsequent regulation of unique transcriptional programs.
Collapse
Affiliation(s)
- Marco Napoli
- Department of Molecular Oncology, Division of Basic Science, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
- Cancer Biology and Evolution Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Avani A. Deshpande
- Department of Molecular Oncology, Division of Basic Science, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
- Cancer Biology and Evolution Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | | | - Kimal Rajapakshe
- Sheikh Ahmed Center for Pancreatic Cancer Research, The University of Texas M.D. Anderson Cancer Center, Houston, Texas
| | | | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas
| | - Elsa R. Flores
- Department of Molecular Oncology, Division of Basic Science, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
- Cancer Biology and Evolution Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| |
Collapse
|
5
|
Ungar RA, Goddard PC, Jensen TD, Degalez F, Smith KS, Jin CA, Bonner DE, Bernstein JA, Wheeler MT, Montgomery SB. Impact of genome build on RNA-seq interpretation and diagnostics. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.11.24301165. [PMID: 38260490 PMCID: PMC10802764 DOI: 10.1101/2024.01.11.24301165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network (UDN) and Genomics Research to Elucidate the Genetics of Rare Disease (GREGoR) Consortium. We identified 2,800 genes with build-dependent quantification across six routinely-collected biospecimens, including 1,391 protein-coding genes and 341 known rare disease genes. We further observed multiple genes that only have detectable expression in a subset of genome builds. Finally, we characterized how genome build impacts the detection of outlier transcriptomic events. Combined, we provide a database of genes impacted by build choice, and recommend that transcriptomics-guided analyses and diagnoses are cross-referenced with these data for robustness.
Collapse
Affiliation(s)
- Rachel A. Ungar
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | - Pagé C. Goddard
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | - Tanner D. Jensen
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
| | | | - Kevin S. Smith
- Department of Pathology, School of Medicine, Stanford University
| | | | | | - Devon E. Bonner
- Department of Pediatrics, School of Medicine, Stanford University
- Stanford Center for Undiagnosed Diseases, Stanford University
| | | | - Matthew T. Wheeler
- Department of Cardiovascular Medicine, School of Medicine, Stanford University
| | - Stephen B. Montgomery
- Department of Genetics, School of Medicine, Stanford University
- Department of Pathology, School of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
6
|
Singleton M, Eisen M. Leveraging genomic redundancy to improve inference and alignment of orthologous proteins. G3 (BETHESDA, MD.) 2023; 13:jkad222. [PMID: 37770067 PMCID: PMC10700111 DOI: 10.1093/g3journal/jkad222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/11/2023] [Accepted: 09/19/2023] [Indexed: 10/03/2023]
Abstract
Identifying protein sequences with common ancestry is a core task in bioinformatics and evolutionary biology. However, methods for inferring and aligning such sequences in annotated genomes have not kept pace with the increasing scale and complexity of the available data. Thus, in this work, we implemented several improvements to the traditional methodology that more fully leverage the redundancy of closely related genomes and the organization of their annotations. Two highlights include the application of the more flexible k-clique percolation algorithm for identifying clusters of orthologous proteins and the development of a novel technique for removing poorly supported regions of alignments with a phylogenetic hidden Markov model (phylo-HMM). In making the latter, we wrote a fully documented Python package Homomorph that implements standard HMM algorithms and created a set of tutorials to promote its use by a wide audience. We applied the resulting pipeline to a set of 33 annotated Drosophila genomes, generating 22,813 orthologous groups and 8,566 high-quality alignments.
Collapse
Affiliation(s)
- Marc Singleton
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, CA 94720, USA
| | - Michael Eisen
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
7
|
Lin MS, Varunjikar MS, Lie KK, Søfteland L, Dellafiora L, Ørnsrud R, Sanden M, Berntssen MHG, Dorne JLCM, Bafna V, Rasinger JD. Multi-tissue proteogenomic analysis for mechanistic toxicology studies in non-model species. ENVIRONMENT INTERNATIONAL 2023; 182:108309. [PMID: 37980879 DOI: 10.1016/j.envint.2023.108309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 08/15/2023] [Accepted: 11/04/2023] [Indexed: 11/21/2023]
Abstract
New approach methodologies (NAM), including omics and in vitro approaches, are contributing to the implementation of 3R (reduction, refinement and replacement) strategies in regulatory science and risk assessment. In this study, we present an integrative transcriptomics and proteomics analysis workflow for the validation and revision of complex fish genomes and demonstrate how proteogenomics expression matrices can be used to support multi-level omics data integration in non-model species in vivo and in vitro. Using Atlantic salmon as an example, we constructed proteogenomic databases from publicly available transcriptomic data and in-house generated RNA-Seq and LC-MS/MS data. Our analysis identified ∼80,000 peptides, providing direct evidence of translation for over 40,000 RefSeq structures. The data also highlighted 183 co-located peptide groups that supported a single transcript each, and in each case, either corrected a previous annotation, supported Ensembl annotations not present in RefSeq, or identified novel previously unannotated genes. Proteogenomics data-derived expression matrices revealed distinct profiles for the different tissue types analyzed. Focusing on proteins involved in defense against xenobiotics, we detected distinct expression patterns across different salmon tissues and observed homology in the expression of chemical defense proteins between in vivo and in vitro liver systems. Our study demonstrates the potential of proteogenomic analyses in extending our understanding of complex fish genomes and provides an advanced bioinformatic toolkit to support the further development of NAMs and their application in regulatory science and (eco)toxicological studies of non-model species.
Collapse
Affiliation(s)
- M S Lin
- Bioinformatics and Systems Biology Program, UC San Diego, San Diego, CA, United States.
| | | | - K K Lie
- Institute of Marine Research, Bergen, Norway.
| | - L Søfteland
- Institute of Marine Research, Bergen, Norway.
| | - L Dellafiora
- Department of Food and Drug, University of Parma, Parco Area delle Scienze 27/A, 43124 Parma, Italy.
| | - R Ørnsrud
- Institute of Marine Research, Bergen, Norway.
| | - M Sanden
- Institute of Marine Research, Bergen, Norway.
| | | | - J L C M Dorne
- European Food Safety Authority, Methodological and Scientific Support Unit, Via Carlo Magno 1A, 43121 Parma, Italy.
| | - V Bafna
- Computer Science & Engineering and HDSI, UC San Diego, San Diego, CA, United States.
| | | |
Collapse
|
8
|
Zhang Q, Shao M. Transcript assembly and annotations: Bias and adjustment. PLoS Comput Biol 2023; 19:e1011734. [PMID: 38127855 PMCID: PMC10769104 DOI: 10.1371/journal.pcbi.1011734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 01/05/2024] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. We investigate the impact of annotations on transcript assembly. Surprisingly, we observe that opposite conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
Collapse
Affiliation(s)
- Qimin Zhang
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Mingfu Shao
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
9
|
Singer-Berk M, Gudmundsson S, Baxter S, Seaby EG, England E, Wood JC, Son RG, Watts NA, Karczewski KJ, Harrison SM, MacArthur DG, Rehm HL, O'Donnell-Luria A. Advanced variant classification framework reduces the false positive rate of predicted loss-of-function variants in population sequencing data. Am J Hum Genet 2023; 110:1496-1508. [PMID: 37633279 PMCID: PMC10502856 DOI: 10.1016/j.ajhg.2023.08.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/09/2023] [Accepted: 08/09/2023] [Indexed: 08/28/2023] Open
Abstract
Predicted loss of function (pLoF) variants are often highly deleterious and play an important role in disease biology, but many pLoF variants may not result in loss of function (LoF). Here we present a framework that advances interpretation of pLoF variants in research and clinical settings by considering three categories of LoF evasion: (1) predicted rescue by secondary sequence properties, (2) uncertain biological relevance, and (3) potential technical artifacts. We also provide recommendations on adjustments to ACMG/AMP guidelines' PVS1 criterion. Applying this framework to all high-confidence pLoF variants in 22 genes associated with autosomal-recessive disease from the Genome Aggregation Database (gnomAD v.2.1.1) revealed predicted LoF evasion or potential artifacts in 27.3% (304/1,113) of variants. The major reasons were location in the last exon, in a homopolymer repeat, in a low proportion expressed across transcripts (pext) scored region, or the presence of cryptic in-frame splice rescues. Variants predicted to evade LoF or to be potential artifacts were enriched for ClinVar benign variants. PVS1 was downgraded in 99.4% (162/163) of pLoF variants predicted as likely not LoF/not LoF, with 17.2% (28/163) downgraded as a result of our framework, adding to previous guidelines. Variant pathogenicity was affected (mostly from likely pathogenic to VUS) in 20 (71.4%) of these 28 variants. This framework guides assessment of pLoF variants beyond standard annotation pipelines and substantially reduces false positive rates, which is key to ensure accurate LoF variant prediction in both a research and clinical setting.
Collapse
Affiliation(s)
- Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Sanna Gudmundsson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA; Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Samantha Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Eleanor G Seaby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA; Genomic Informatics Group, University Hospital Southampton, Southampton, UK
| | - Eleina England
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jordan C Wood
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Rachel G Son
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Steven M Harrison
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Ambry Genetics, Aliso Viejo, CA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, NSW, Australia; Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine & Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
10
|
Zhang J, Lin X, Chen Y, Li T, Lee AC, Chow EY, Cho WC, Chan T. LAFITE Reveals the Complexity of Transcript Isoforms in Subcellular Fractions. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2203480. [PMID: 36461702 PMCID: PMC9875686 DOI: 10.1002/advs.202203480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 10/28/2022] [Indexed: 06/17/2023]
Abstract
Characterization of the subcellular distribution of RNA is essential for understanding the molecular basis of biological processes. Here, the subcellular nanopore direct RNA-sequencing (DRS) of four lung cancer cell lines (A549, H1975, H358, and HCC4006) is performed, coupled with a computational pipeline, Low-abundance Aware Full-length Isoform clusTEr (LAFITE), to comprehensively analyze the full-length cytoplasmic and nuclear transcriptome. Using additional DRS and orthogonal data sets, it is shown that LAFITE outperforms current methods for detecting full-length transcripts, particularly for low-abundance isoforms that are usually overlooked due to poor read coverage. Experimental validation of six novel isoforms exclusively identified by LAFITE further confirms the reliability of this pipeline. By applying LAFITE to subcellular DRS data, the complexity of the nuclear transcriptome is revealed in terms of isoform diversity, 3'-UTR usage, m6A modification patterns, and intron retention. Overall, LAFITE provides enhanced full-length isoform identification and enables a high-resolution view of the RNA landscape at the isoform level.
Collapse
Affiliation(s)
- Jizhou Zhang
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
- State Key Laboratory of AgrobiotechnologyThe Chinese University of Hong KongShatinHong Kong SARChina
| | - Xiao Lin
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
- State Key Laboratory of AgrobiotechnologyThe Chinese University of Hong KongShatinHong Kong SARChina
| | - Yuelong Chen
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
| | - Tsz‐Ho Li
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
- State Key Laboratory of AgrobiotechnologyThe Chinese University of Hong KongShatinHong Kong SARChina
| | - Alan Chun‐Kit Lee
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
| | | | | | - Ting‐Fung Chan
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
- State Key Laboratory of AgrobiotechnologyThe Chinese University of Hong KongShatinHong Kong SARChina
| |
Collapse
|
11
|
Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland J, Mudge J, Sisu C, Wright J, Arnan C, Barnes I, Banerjee A, Bennett R, Berry A, Bignell A, Boix C, Calvet F, Cerdán-Vélez D, Cunningham F, Davidson C, Donaldson S, Dursun C, Fatima R, Giorgetti S, Giron C, Gonzalez J, Hardy M, Harrison P, Hourlier T, Hollis Z, Hunt T, James B, Jiang Y, Johnson R, Kay M, Lagarde J, Martin F, Gómez L, Nair S, Ni P, Pozo F, Ramalingam V, Ruffier M, Schmitt B, Schreiber J, Steed E, Suner MM, Sumathipala D, Sycheva I, Uszczynska-Ratajczak B, Wass E, Yang Y, Yates A, Zafrulla Z, Choudhary J, Gerstein M, Guigo R, Hubbard TJP, Kellis M, Kundaje A, Paten B, Tress M, Flicek P. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res 2022; 51:D942-D949. [PMID: 36420896 PMCID: PMC9825462 DOI: 10.1093/nar/gkac1071] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/15/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- To whom correspondence should be addressed. Tel: +44 1223 494388; Fax: +44 1223 484696;
| | - Sílvia Carbonell-Sala
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Department of Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Carme Arnan
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Abhimanyu Banerjee
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Boix
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Ferriol Calvet
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cagatay Dursun
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos Garcıa Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin James
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Rory Johnson
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland,School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Surag Nair
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Pengyu Ni
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Vivek Ramalingam
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jacob M Schreiber
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Barbara Uszczynska-Ratajczak
- Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahoor Zafrulla
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Roderic Guigo
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
12
|
Tung KF, Lin WC. TEx-MST: tissue expression profiles of MANE select transcripts. Database (Oxford) 2022; 2022:6726258. [PMID: 36170113 PMCID: PMC9518666 DOI: 10.1093/database/baac089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/16/2022] [Accepted: 09/23/2022] [Indexed: 12/05/2022]
Abstract
Recently, a new reference transcript dataset [Matched Annotation from the NCBI and EMBL-EBI (MANE) select] was released by NCBI and EMBL-EBI to make available a new unified representative transcript for human protein-coding genes. While the main purpose of MANE project is to provide a harmonized gene and transcript information standard, there is no explicit tissue expression information about these MANE select transcripts. In this report, we tried to provide useful expression profiles of MANE select transcripts in various normal human tissues to allow further interrogation of their molecular modulations and functional significance. We obtained the new V9 transcript expression dataset from the Genotype-Tissue Expression (GTEx) web portal. This new GTEx dataset, based on a long-read sequencing platform, affords better assessment of the expression of alternative spliced transcripts. This tissue expression profiles of MANE select transcripts (TEx-MST) database not only provides the basic information of MANE select transcripts but also tissue expression profiles on alternative transcripts in protein-coding genes. Users can initiate the interrogation by gene symbol searches or by browsing the MANE genes with various criteria (such as genome locations or expression rankings). We further utilized the GENCODE biotype feature to identify the top-ranked protein-coding transcripts by choosing the most expressed protein-coding transcripts from GTEx datasets (both V8 and V9 datasets). In summary, there are 18 083 genes matched between MANE and GTEx. Among them, 13 245 MANE select transcripts matched with the top-ranked protein-coding transcripts in GTEx V9 dataset, which underlined the dominate expression of MANE select transcripts. This TEx-MST web bioinformatic database provides a visualized user interface for the normal tissue expression patterns of MANE select transcripts using the newly released GTEx dataset. Database URL: TEx-MST is available at https://texmst.ibms.sinica.edu.tw/
Collapse
Affiliation(s)
- Kuo-Feng Tung
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
| | - Wen-chang Lin
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
- Institute of Biomedical Informatics, National Yang-Ming Chiao Tung University , Taipei 112, Taiwan, R.O.C
| |
Collapse
|
13
|
Wu C, Lu X, Lu S, Wang H, Li D, Zhao J, Jin J, Sun Z, He QY, Chen Y, Zhang G. Efficient Detection of the Alternative Spliced Human Proteome Using Translatome Sequencing. Front Mol Biosci 2022; 9:895746. [PMID: 35720116 PMCID: PMC9201276 DOI: 10.3389/fmolb.2022.895746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 04/28/2022] [Indexed: 01/08/2023] Open
Abstract
Alternative splicing (AS) isoforms create numerous proteoforms, expanding the complexity of the genome. Highly similar sequences, incomplete reference databases and the insufficient sequence coverage of mass spectrometry limit the identification of AS proteoforms. Here, we demonstrated full-length translating mRNAs (ribosome nascent-chain complex-bound mRNAs, RNC-mRNAs) sequencing (RNC-seq) strategy to sequence the entire translating mRNA using next-generation sequencing, including short-read and long-read technologies, to construct a protein database containing all translating AS isoforms. Taking the advantage of read length, short-read RNC-seq identified up to 15,289 genes and 15,906 AS isoforms in a single human cell line, much more than the Ribo-seq. The single-molecule long-read RNC-seq supplemented 4,429 annotated AS isoforms that were not identified by short-read datasets, and 4,525 novel AS isoforms that were not included in the public databases. Using such RNC-seq-guided database, we identified 6,766 annotated protein isoforms and 50 novel protein isoforms in mass spectrometry datasets. These results demonstrated the potential of full-length RNC-seq in investigating the proteome of AS isoforms.
Collapse
Affiliation(s)
- Chun Wu
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Xiaolong Lu
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Shaohua Lu
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
- State Key Laboratory of Respiratory Disease, School of Basic Medical Sciences, Sino-French Hoffmann Institute, Guangzhou Medical University, Guangzhou, China
| | - Hongwei Wang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Dehua Li
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Jing Zhao
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Jingjie Jin
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Zhenghua Sun
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Yang Chen
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
- *Correspondence: Gong Zhang, ; Yang Chen,
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
- *Correspondence: Gong Zhang, ; Yang Chen,
| |
Collapse
|
14
|
Corominas J, Smeekens SP, Nelen MR, Yntema HG, Kamsteeg EJ, Pfundt R, Gilissen C. Clinical exome sequencing - mistakes and caveats. Hum Mutat 2022; 43:1041-1055. [PMID: 35191116 PMCID: PMC9541396 DOI: 10.1002/humu.24360] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 01/11/2022] [Accepted: 02/18/2022] [Indexed: 11/30/2022]
Abstract
Massive parallel sequencing technology has become the predominant technique for genetic diagnostics and research. Many genetic laboratories have wrestled with the challenges of setting up genetic testing workflows based on a completely new technology. The learning curve we went through as a laboratory was accompanied by growing pains while we gained new knowledge and expertise. Here we discuss some important mistakes that have been made in our laboratory through 10 years of clinical exome sequencing but that have given us important new insights on how to adapt our working methods. We provide these examples and the lessons that we learned to help other laboratories avoid to make the same mistakes.
Collapse
Affiliation(s)
- Jordi Corominas
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Sanne P Smeekens
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Marcel R Nelen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Helger G Yntema
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Erik-Jan Kamsteeg
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Rolph Pfundt
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Christian Gilissen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands.,Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
15
|
Ruhrman-Shahar N, Assia Batzir N, Lidzbarsky GA, Bazak L, Magal N, Basel-Salmon L. A nonsense variant in the second exon of the canonical transcript of NSD1 does not cause Sotos syndrome. Am J Med Genet A 2021; 188:369-372. [PMID: 34559457 DOI: 10.1002/ajmg.a.62519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 08/26/2021] [Accepted: 09/04/2021] [Indexed: 11/08/2022]
Affiliation(s)
- Noa Ruhrman-Shahar
- Raphael Recanati Genetic Institute, Rabin Medical Center-Beilinson Hospital, Petach Tikva, Israel
| | - Nurit Assia Batzir
- Pediatric Genetics Clinic, Schneider Children's Medical Center of Israel, Petach Tikva, Israel
| | - Gabriel Arie Lidzbarsky
- Raphael Recanati Genetic Institute, Rabin Medical Center-Beilinson Hospital, Petach Tikva, Israel
| | - Lily Bazak
- Raphael Recanati Genetic Institute, Rabin Medical Center-Beilinson Hospital, Petach Tikva, Israel
| | - Nurit Magal
- Raphael Recanati Genetic Institute, Rabin Medical Center-Beilinson Hospital, Petach Tikva, Israel
| | - Lina Basel-Salmon
- Raphael Recanati Genetic Institute, Rabin Medical Center-Beilinson Hospital, Petach Tikva, Israel.,Pediatric Genetics Clinic, Schneider Children's Medical Center of Israel, Petach Tikva, Israel.,Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Felsenstein Medical Research Center, Petach Tikva, Israel
| |
Collapse
|
16
|
Huang G, Zhang H, Qu Y, Huang K, Gong X, Wei J, Du H. ARMT: An automatic RNA-seq data mining tool based on comprehensive and integrative analysis in cancer research. Comput Struct Biotechnol J 2021; 19:4426-4434. [PMID: 34471489 PMCID: PMC8379379 DOI: 10.1016/j.csbj.2021.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/19/2021] [Accepted: 08/06/2021] [Indexed: 11/02/2022] Open
Abstract
The comprehensive and integrative analysis of RNA-seq data, in different molecular layers from diverse samples, holds promise to address the full-scale complexity of biological systems. Recent advances in gene set variant analysis (GSVA) are providing exciting opportunities for revealing the specific biological processes of cancer samples. However, it is still urgently needed to develop a tool, which combines GSVA and different molecular characteristic analysis, as well as prognostic characteristics of cancer patients to reveal the biological processes of disease comprehensively. Here, we develop ARMT, an automatic tool for RNA-Seq data analysis. ARMT is an efficient and integrative tool with user-friendly interface to analyze related molecular characters of single gene and gene set comprehensively based on transcriptome and genomic data, which builds the bridge for deeper information between genes and pathways, to further accelerate scientific findings. ARMT can be installed easily from https://github.com/Dulab2020/ARMT.
Collapse
Affiliation(s)
- Guanda Huang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Haibo Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yimo Qu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Kaitang Huang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Xiaocheng Gong
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jinfen Wei
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Hongli Du
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
17
|
Li Y, Guo D. Genome-wide profiling of alternative splicing in glioblastoma and their clinical value. BMC Cancer 2021; 21:958. [PMID: 34445990 PMCID: PMC8393481 DOI: 10.1186/s12885-021-08681-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 08/13/2021] [Indexed: 12/20/2022] Open
Abstract
Background Alternative splicing (AS), one of the main post-transcriptional biological regulation mechanisms, plays a key role in the progression of glioblastoma (GBM). Systematic AS profiling in GBM is limited and urgently needed. Methods TCGA SpliceSeq data and the corresponding clinical data were downloaded from the TCGA data portal. Survival-related AS events were identified through Kaplan–Meier survival analysis and univariate Cox analysis. Then, splicing correlation network was constructed based on these AS events and associated splicing factors. LASSO regression followed by multivariate Cox analysis was performed to validate independent AS biomarkers and to construct a risk prediction model. Enrichment analysis was subsequently conducted to explore potential signaling pathways of these AS events. Results A total of 132 TCGA GBM samples and 45,610 AS events were included in our study, among which 416 survival-related AS events were identified. An AS correlation network, including 54 AS events and 94 splicing factors, was constructed, and further functional enrichment was performed. Moreover, the novel risk prediction model we constructed displayed moderate performance (the area under the curves were > 0.7) at both one, two and three years. Conclusions Survival-related AS events may be vital factors of both biological function and prognosis. Our findings in this study can deepen the understanding of the complicated mechanisms of AS in GBM and provide novel insights for further study. Moreover, our risk prediction model is ready for preliminary clinical applications. Further verification is required. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-08681-z.
Collapse
Affiliation(s)
- Youwei Li
- Department of Neurosurgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, People's Republic of China
| | - Dongsheng Guo
- Department of Neurosurgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, People's Republic of China.
| |
Collapse
|
18
|
Li H, Dawood M, Khayat MM, Farek JR, Jhangiani SN, Khan ZM, Mitani T, Coban-Akdemir Z, Lupski JR, Venner E, Posey JE, Sabo A, Gibbs RA. Exome variant discrepancies due to reference-genome differences. Am J Hum Genet 2021; 108:1239-1250. [PMID: 34129815 PMCID: PMC8322936 DOI: 10.1016/j.ajhg.2021.05.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/19/2021] [Indexed: 12/15/2022] Open
Abstract
Despite release of the GRCh38 human reference genome more than seven years ago, GRCh37 remains more widely used by most research and clinical laboratories. To date, no study has quantified the impact of utilizing different reference assemblies for the identification of variants associated with rare and common diseases from large-scale exome-sequencing data. By calling variants on both the GRCh37 and GRCh38 references, we identified single-nucleotide variants (SNVs) and insertion-deletions (indels) in 1,572 exomes from participants with Mendelian diseases and their family members. We found that a total of 1.5% of SNVs and 2.0% of indels were discordant when different references were used. Notably, 76.6% of the discordant variants were clustered within discrete discordant reference patches (DISCREPs) comprising only 0.9% of loci targeted by exome sequencing. These DISCREPs were enriched for genomic elements including segmental duplications, fix patch sequences, and loci known to contain alternate haplotypes. We identified 206 genes significantly enriched for discordant variants, most of which were in DISCREPs and caused by multi-mapped reads on the reference assembly that lacked the variant call. Among these 206 genes, eight are implicated in known Mendelian diseases and 53 are associated with common phenotypes from genome-wide association studies. In addition, variant interpretations could also be influenced by the reference after lifting-over variant loci to another assembly. Overall, we identified genes and genomic loci affected by reference assembly choice, including genes associated with Mendelian disorders and complex human diseases that require careful evaluation in both research and clinical applications.
Collapse
Affiliation(s)
- He Li
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moez Dawood
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael M Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jesse R Farek
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ziad M Khan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Tadahiro Mitani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zeynep Coban-Akdemir
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - James R Lupski
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Texas Children's Hospital, Houston, TX 77030, USA
| | - Eric Venner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
19
|
Jain PN, Robertson M, Lasa JJ, Shekerdemian L, Guffey D, Zhang Y, Lingappan K, Checchia P, Coarfa C. Altered metabolic and inflammatory transcriptomics after cardiac surgery in neonates with congenital heart disease. Sci Rep 2021; 11:4965. [PMID: 33654130 PMCID: PMC7925649 DOI: 10.1038/s41598-021-83882-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 01/22/2021] [Indexed: 12/20/2022] Open
Abstract
The study examines the whole blood transcriptome profile before and after cardiopulmonary bypass (CPB) in neonates with hypoplastic left heart syndrome (HLHS), a severe form of congenital heart disease, that can develop low cardiac output syndrome (LCOS). Whole blood mRNA transcriptome profiles of 13 neonates with HLHS before and after their first palliative surgery were analyzed to determine differentially expressed genes and pathways. The median age and weight at surgery were 4 days and 3.2 kg, respectively. Of the 13 patients, 8 developed LCOS. There was no significant difference between CPB, aortic cross clamp, deep hypothermic cardiac arrest times between patients that develop LCOS and those that do not. Upon comparing differential gene expression profiles between patients that develop LCOS and those that do not in pre-operative samples, 1 gene was up-regulated and 13 were down regulated. In the post-operative samples, 4 genes were up-regulated, and 4 genes were down regulated when patients that develop LCOS were compared to those that do not. When comparing post-operative samples to pre-operative samples in the patients that do not develop LCOS, 1484 genes were up-regulated, and 1388 genes were down regulated; while patients that developed LCOS had 2423 up-regulated genes, and 2414 down regulated genes for the same pre to post-operative comparison. Pathway analysis revealed differential regulation of inflammatory pathways (IL signaling, PDGF, NOTCH1, NGF, GPCR) and metabolic pathways (heme metabolism, oxidative phosphorylation, protein metabolism including amino acid and derivatives, fatty acid metabolism, TCA cycle and respiratory electron transport chain). By identifying altered transcriptome profiles related to inflammation and metabolism in neonates with HLHS who develop LCOS after CPB, this study opens for exploration novel pathways and potential therapeutic targets to improve outcomes in this high-risk population.
Collapse
Affiliation(s)
- Parag N Jain
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA.
| | | | - Javier J Lasa
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | - Lara Shekerdemian
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | | | - Yuhao Zhang
- Baylor College of Medicine, Houston, TX, USA
| | - Krithika Lingappan
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | - Paul Checchia
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | | |
Collapse
|
20
|
Pan Q, Liu YJ, Bai XF, Han XL, Jiang Y, Ai B, Shi SS, Wang F, Xu MC, Wang YZ, Zhao J, Chen JX, Zhang J, Li XC, Zhu J, Zhang GR, Wang QY, Li CQ. VARAdb: a comprehensive variation annotation database for human. Nucleic Acids Res 2021; 49:D1431-D1444. [PMID: 33095866 PMCID: PMC7779011 DOI: 10.1093/nar/gkaa922] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 09/28/2020] [Accepted: 10/22/2020] [Indexed: 01/08/2023] Open
Abstract
With the study of human diseases and biological processes increasing, a large number of non-coding variants have been identified and facilitated. The rapid accumulation of genetic and epigenomic information has resulted in an urgent need to collect and process data to explore the regulation of non-coding variants. Here, we developed a comprehensive variation annotation database for human (VARAdb, http://www.licpathway.net/VARAdb/), which specifically considers non-coding variants. VARAdb provides annotation information for 577,283,813 variations and novel variants, prioritizes variations based on scores using nine annotation categories, and supports pathway downstream analysis. Importantly, VARAdb integrates a large amount of genetic and epigenomic data into five annotation sections, which include ‘Variation information’, ‘Regulatory information’, ‘Related genes’, ‘Chromatin accessibility’ and ‘Chromatin interaction’. The detailed annotation information consists of motif changes, risk SNPs, LD SNPs, eQTLs, clinical variant-drug-gene pairs, sequence conservation, somatic mutations, enhancers, super enhancers, promoters, transcription factors, chromatin states, histone modifications, chromatin accessibility regions and chromatin interactions. This database is a user-friendly interface to query, browse and visualize variations and related annotation information. VARAdb is a useful resource for selecting potential functional variations and interpreting their effects on human diseases and biological processes.
Collapse
Affiliation(s)
- Qi Pan
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Yue-Juan Liu
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Xue-Feng Bai
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Xiao-Le Han
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Yong Jiang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Bo Ai
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Shan-Shan Shi
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Fan Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Ming-Cong Xu
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Yue-Zhu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Jun Zhao
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Jia-Xin Chen
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Xue-Cang Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Jiang Zhu
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Guo-Rui Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Qiu-Yu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| | - Chun-Quan Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University. Daqing 163319, China
| |
Collapse
|
21
|
SoRelle JA, Wachsmann M, Cantarel BL. Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays. Arch Pathol Lab Med 2020; 144:1118-1130. [PMID: 32045276 DOI: 10.5858/arpa.2019-0476-ra] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/09/2019] [Indexed: 11/06/2022]
Abstract
CONTEXT.— Clinical next-generation sequencing (NGS) is being rapidly adopted, but analysis and interpretation of large data sets prompt new challenges for a clinical laboratory setting. Clinical NGS results rely heavily on the bioinformatics pipeline for identifying genetic variation in complex samples. The choice of bioinformatics algorithms, genome assembly, and genetic annotation databases are important for determining genetic alterations associated with disease. The analysis methods are often tuned to the assay to maximize accuracy. Once a pipeline has been developed, it must be validated to determine accuracy and reproducibility for samples similar to real-world cases. In silico proficiency testing or institutional data exchange will ensure consistency among clinical laboratories. OBJECTIVE.— To provide molecular pathologists a step-by-step guide to bioinformatics analysis and validation design in order to navigate the regulatory and validation standards of implementing a bioinformatic pipeline as a part of a new clinical NGS assay. DATA SOURCES.— This guide uses published studies on genomic analysis, bioinformatics methods, and methods comparison studies to inform the reader on what resources, including open source software tools and databases, are available for genetic variant detection and interpretation. CONCLUSIONS.— This review covers 4 key concepts: (1) bioinformatic analysis design for detecting genetic variation, (2) the resources for assessing genetic effects, (3) analysis validation assessment experiments and data sets, including a diverse set of samples to mimic real-world challenges that assess accuracy and reproducibility, and (4) if concordance between clinical laboratories will be improved by proficiency testing designed to test bioinformatic pipelines.
Collapse
Affiliation(s)
- Jeffrey A SoRelle
- Department of Pathology (SoRelle, Wachsmann), University of Texas Southwestern Medical Center, Dallas
| | - Megan Wachsmann
- Department of Pathology (SoRelle, Wachsmann), University of Texas Southwestern Medical Center, Dallas
| | - Brandi L Cantarel
- Bioinformatics Core Facility (Cantarel), University of Texas Southwestern Medical Center, Dallas.,Department of Bioinformatics (Cantarel), University of Texas Southwestern Medical Center, Dallas.,University of Texas Southwestern Medical Center, Dallas
| |
Collapse
|
22
|
Kalfakakou D, Konstantopoulou I, Yannoukakos D, Fostira F. Pitfalls in variant annotation for hereditary cancer diagnostics: The example of Illumina® VariantStudio®. Genomics 2020; 113:748-754. [PMID: 33053411 DOI: 10.1016/j.ygeno.2020.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 08/04/2020] [Accepted: 10/08/2020] [Indexed: 11/27/2022]
Abstract
Next Generation Sequencing (NGS), and specifically targeted panel sequencing is the state-of-the-art in clinical genetic diagnosis of Mendelian diseases. However, the bioinformatics analysis and interpretation of the generated data can be challenging. A spotlight on the default transcript selection of a user-friendly, commercially available software that is widely used by genetics professionals, i.e. Illumina® VariantStudio®, is presented. For the sake of comparison, we employed Ensembl VEP, an open-source command-line tool, as it provides flexibility regarding transcript selection. The analysis of NGS data deriving from sequencing of 857 germline DNA samples of cancer patients indicated a concordance of 82.82% between the two software programs. Significantly, using the default transcript configuration of VariantStudio®, we failed to annotate correctly 11.45% of the identified loss-of-function variants. Our results underline the importance of cautious software and transcript selection and the need for reliable, white-box data analysis, along with bioinformatics expertise in clinical diagnostics.
Collapse
Affiliation(s)
- Despoina Kalfakakou
- Molecular Diagnostics Laboratory, INRaSTES, National Center for Scientific Research "Demokritos", Greece
| | - Irene Konstantopoulou
- Molecular Diagnostics Laboratory, INRaSTES, National Center for Scientific Research "Demokritos", Greece
| | - Drakoulis Yannoukakos
- Molecular Diagnostics Laboratory, INRaSTES, National Center for Scientific Research "Demokritos", Greece
| | - Florentia Fostira
- Molecular Diagnostics Laboratory, INRaSTES, National Center for Scientific Research "Demokritos", Greece.
| |
Collapse
|
23
|
Biales AD, Bencic DC, Flick RW, Delacruz A, Gordon DA, Huang W. Global transcriptomic profiling of microcystin-LR or -RR treated hepatocytes (HepaRG). Toxicon X 2020; 8:100060. [PMID: 33235993 PMCID: PMC7670210 DOI: 10.1016/j.toxcx.2020.100060] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/24/2020] [Accepted: 09/27/2020] [Indexed: 12/20/2022] Open
Abstract
The canonical mode of action (MOA) of microcystins (MC) is the inhibition of protein phosphatases, but complete characterization of toxicity pathways is lacking. The existence of over 200 MC congeners complicates risk estimates worldwide. This work employed RNA-seq to provide an unbiased and comprehensive characterization of cellular targets and impacted cellular processes of hepatocytes exposed to either MC-LR or MC-RR congeners. The human hepatocyte cell line, HepaRG, was treated with three concentrations of MC-LR or -RR for 2 h. Significant reduction in cell survival was observed in LR1000 and LR100 treatments whereas no acute toxicity was observed in any MR-RR treatment. RNA-seq was performed on all treatments of MC-LR and -RR. Differentially expressed genes and pathways associated with oxidative and endoplasmic reticulum (ER) stress, and the unfolded protein response (UPR) were highly enriched by both congeners as were inflammatory pathways. Genes associated with both apoptotic and inflammatory pathways were enriched in LR1000. We present a model of MC toxicity that immediately causes oxidative stress and leads to ER stress and the activation of the UPR. Differential activation of the three arms of the UPR and the kinetics of JNK activation ultimately determine whether cell survival or apoptosis is favored. Extracellular exosomes were enrichment of by both congeners, suggesting a previously unidentified mechanism for MC-dependent extracellular signaling. The complement system was enriched only in MC-RR treatments, suggesting congener-specific differences in cellular effects. This study provided an unbiased snapshot of the early systemic hepatocyte response to MC-LR and MC-RR congeners and may explain differences in toxicity among MC congeners. Microcystin-LR and microcystin-RR have similar transcriptional responses. Genes associated with oxidative stress and the unfolded protein response were enriched by congeners. Genes associated with extracellular exosomes were enriched, suggesting a potential new mechanism for cell signaling. Complement associated genes were strongly enriched only by microcystin-RR. Identified a potential molecular mechanism underlying the cellular fate of hepatocyte.
Collapse
Affiliation(s)
- Adam D Biales
- U.S. Environmental Protection Agency, Office of Research and Development, Cincinnati, OH, 45268, USA
| | - David C Bencic
- U.S. Environmental Protection Agency, Office of Research and Development, Cincinnati, OH, 45268, USA
| | - Robert W Flick
- U.S. Environmental Protection Agency, Office of Research and Development, Cincinnati, OH, 45268, USA
| | - Armah Delacruz
- U.S. Environmental Protection Agency, Office of Research and Development, Cincinnati, OH, 45268, USA
| | - Denise A Gordon
- U.S. Environmental Protection Agency, Office of Research and Development, Cincinnati, OH, 45268, USA
| | - Weichun Huang
- U.S. Environmental Protection Agency, Office of Research and Development, Research Triangle Park, NC, 27709, USA
| |
Collapse
|
24
|
Khan AH, Lin A, Wang RT, Bloom JS, Lange K, Smith DJ. Pooled analysis of radiation hybrids identifies loci for growth and drug action in mammalian cells. Genome Res 2020; 30:1458-1467. [PMID: 32878976 PMCID: PMC7605260 DOI: 10.1101/gr.262204.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 08/26/2020] [Indexed: 12/16/2022]
Abstract
Genetic screens in mammalian cells commonly focus on loss-of-function approaches. To evaluate the phenotypic consequences of extra gene copies, we used bulk segregant analysis (BSA) of radiation hybrid (RH) cells. We constructed six pools of RH cells, each consisting of ∼2500 independent clones, and placed the pools under selection in media with or without paclitaxel. Low pass sequencing identified 859 growth loci, 38 paclitaxel loci, 62 interaction loci, and three loci for mitochondrial abundance at genome-wide significance. Resolution was measured as ∼30 kb, close to single-gene. Divergent properties were displayed by the RH-BSA growth genes compared to those from loss-of-function screens, refuting the balance hypothesis. In addition, enhanced retention of human centromeres in the RH pools suggests a new approach to functional dissection of these chromosomal elements. Pooled analysis of RH cells showed high power and resolution and should be a useful addition to the mammalian genetic toolkit.
Collapse
Affiliation(s)
- Arshad H Khan
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA
| | - Andy Lin
- Office of Information Technology, UCLA, Los Angeles, California 90095-1557, USA
| | - Richard T Wang
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Joshua S Bloom
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA.,Howard Hughes Medical Institute, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Kenneth Lange
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Desmond J Smith
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA
| |
Collapse
|
25
|
Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020; 20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open
Abstract
In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Genentech, Inc. 1 DNA Way, Mail Stop: 35-6J, South San Francisco, CA, USA
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Kasavanahalli, Carmelaram P.O., Bengaluru, Karnataka, India
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - Stefan Canzar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| |
Collapse
|
26
|
Liu Y, Fu L, Kaufmann K, Chen D, Chen M. A practical guide for DNase-seq data analysis: from data management to common applications. Brief Bioinform 2020; 20:1865-1877. [PMID: 30010713 DOI: 10.1093/bib/bby057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 06/06/2018] [Accepted: 06/10/2018] [Indexed: 01/01/2023] Open
Abstract
Deoxyribonuclease I (DNase I)-hypersensitive site sequencing (DNase-seq) has been widely used to determine chromatin accessibility and its underlying regulatory lexicon. However, exploring DNase-seq data requires sophisticated downstream bioinformatics analyses. In this study, we first review computational methods for all of the major steps in DNase-seq data analysis, including experimental design, quality control, read alignment, peak calling, annotation of cis-regulatory elements, genomic footprinting and visualization. The challenges associated with each step are highlighted. Next, we provide a practical guideline and a computational pipeline for DNase-seq data analysis by integrating some of these tools. We also discuss the competing techniques and the potential applications of this pipeline for the analysis of analogous experimental data. Finally, we discuss the integration of DNase-seq with other functional genomics techniques.
Collapse
Affiliation(s)
- Yongjing Liu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Liangyu Fu
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| | - Kerstin Kaufmann
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| | - Dijun Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| |
Collapse
|
27
|
Jin P, Tan Y, Zhang W, Li J, Wang K. Prognostic alternative mRNA splicing signatures and associated splicing factors in acute myeloid leukemia. Neoplasia 2020; 22:447-457. [PMID: 32653835 PMCID: PMC7356271 DOI: 10.1016/j.neo.2020.06.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 06/05/2020] [Accepted: 06/08/2020] [Indexed: 12/22/2022] Open
Abstract
The dysregulation of alternative splicing (AS) has emerged as a mechanism of acute myeloid leukemia (AML). However, the prognostic impact of AS events remains under-explored in AML. Here we report the prognostic value of AS events and associated splicing factors based on three datasets of AML patients. We defined the landscape of AS events in AML and identified 7033 AS events associated with the survival of AML patients. Based on these events, we further developed a composite 15 AS event-based prognostic signature, which was independent of the cytogenetic risk stratification and patient age, and showed a better performance than known gene expression signatures. More importantly, our new signature markedly improved the European LeukemiaNet (ELN) risk classification, indicating a broad applicability in the clinical management of AML. Furthermore, the splicing-regulatory network established the correlations between prognostic AS events and associated splicing factors. The finding was validated by CRISPR-based data, which indicated that the increased expression of RBM39 contributed to the higher exon inclusion of SETD5 and conferred a poor outcome. Together, AS events may serve as a novel assortment of prognosticators for AML and could refine the ELN risk stratification. The splicing regulatory network provides clues regarding the splicing factor-mediated mechanisms of AML.
Collapse
Affiliation(s)
- Peng Jin
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yun Tan
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China; Sino-French Research Center for Life Sciences and Genomics, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Wei Zhang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China; School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Junmin Li
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Kankan Wang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China; Sino-French Research Center for Life Sciences and Genomics, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
28
|
Yue Z, Chu X, Xia J. PredCID: prediction of driver frameshift indels in human cancer. Brief Bioinform 2020; 22:5860690. [PMID: 32591774 DOI: 10.1093/bib/bbaa119] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 05/14/2020] [Accepted: 05/16/2020] [Indexed: 11/12/2022] Open
Abstract
The discrimination of driver from passenger mutations has been a hot topic in the field of cancer biology. Although recent advances have improved the identification of driver mutations in cancer genomic research, there is no computational method specific for the cancer frameshift indels (insertions or/and deletions) yet. In addition, existing pathogenic frameshift indel predictors may suffer from plenty of missing values because of different choices of transcripts during the variant annotation processes. In this study, we proposed a computational model, called PredCID (Predictor for Cancer driver frameshift InDels), for accurately predicting cancer driver frameshift indels. Gene, DNA, transcript and protein level features are combined together and selected for classification with eXtreme Gradient Boosting classifier. Benchmarking results on the cross-validation dataset and independent dataset showed that PredCID achieves better and robust performance compared with existing noncancer-specific methods in distinguishing cancer driver frameshift indels from passengers and is therefore a valuable method for deeper understanding of frameshift indels in human cancer. PredCID is freely available for academic research at http://bioinfo.ahu.edu.cn:8080/PredCID.
Collapse
Affiliation(s)
| | - Xinlu Chu
- Institutes of Physical Science and Information Technology, Anhui University
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University
| |
Collapse
|
29
|
Bartley BA, Beal J, Karr JR, Strychalski EA. Organizing genome engineering for the gigabase scale. Nat Commun 2020; 11:689. [PMID: 32019919 PMCID: PMC7000699 DOI: 10.1038/s41467-020-14314-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 12/18/2019] [Indexed: 12/11/2022] Open
Abstract
Genome-scale engineering holds great potential to impact science, industry, medicine, and society, and recent improvements in DNA synthesis have enabled the manipulation of megabase genomes. However, coordinating and integrating the workflows and large teams necessary for gigabase genome engineering remains a considerable challenge. We examine this issue and recommend a path forward by: 1) adopting and extending existing representations for designs, assembly plans, samples, data, and workflows; 2) developing new technologies for data curation and quality control; 3) conducting fundamental research on genome-scale modeling and design; and 4) developing new legal and contractual infrastructure to facilitate collaboration.
Collapse
Affiliation(s)
| | - Jacob Beal
- Raytheon BBN Technologies, Cambridge, MA, 02138, USA.
| | - Jonathan R Karr
- Icahn Institute and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10128, USA
| | | |
Collapse
|
30
|
DeVaux RS, Ropri AS, Grimm SL, Hall PA, Herrera EO, Chittur SV, Smith WP, Coarfa C, Behbod F, Herschkowitz JI. Long noncoding RNA BHLHE40-AS1 promotes early breast cancer progression through modulating IL-6/STAT3 signaling. J Cell Biochem 2020; 121:3465-3478. [PMID: 31907974 DOI: 10.1002/jcb.29621] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 12/19/2019] [Indexed: 01/08/2023]
Abstract
Ductal carcinoma in situ (DCIS) is a nonobligate precursor to invasive breast cancer. Only a small percentage of DCIS cases are predicted to progress; however, there is no method to determine which DCIS lesions will remain innocuous from those that will become invasive disease. Therefore, DCIS is treated aggressively creating a current state of overdiagnosis and overtreatment. There is a critical need to identify functional determinants of progression of DCIS to invasive ductal carcinoma (IDC). Interrogating biopsies from five patients with contiguous DCIS and IDC lesions, we have shown that expression of the long noncoding RNA BHLHE40-AS1 increases with disease progression. BHLHE40-AS1 expression supports DCIS cell proliferation, motility, and invasive potential. Mechanistically, BHLHE40-AS1 modulates interleukin (IL)-6/signal transducer and activator of transcription 3 (STAT3) activity and a proinflammatory cytokine signature, in part through interaction with interleukin enhancer-binding factor 3. These data suggest that BHLHE40-AS1 supports early breast cancer progression by engaging STAT3 signaling, creating an immune-permissive microenvironment.
Collapse
Affiliation(s)
- Rebecca S DeVaux
- Department of Biomedical Sciences, Cancer Research Center, University at Albany-SUNY, Rensselaer, New York
| | - Ali S Ropri
- Department of Biomedical Sciences, Cancer Research Center, University at Albany-SUNY, Rensselaer, New York
| | - Sandra L Grimm
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas
| | - Peter A Hall
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario, Canada
| | | | - Sridar V Chittur
- Center for Functional Genomics, University at Albany-SUNY, Rensselaer, New York
| | - William P Smith
- Department of Radiology, Hays Medical Center, University of Kansas Health System, Kansas City, Kansas
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas
| | - Fariba Behbod
- Division of Cancer and Developmental Biology, University of Kansas Medical Center, Kansas City, Kansas
| | - Jason I Herschkowitz
- Department of Biomedical Sciences, Cancer Research Center, University at Albany-SUNY, Rensselaer, New York
| |
Collapse
|
31
|
Hatje K, Mühlhausen S, Simm D, Kollmar M. The Protein-Coding Human Genome: Annotating High-Hanging Fruits. Bioessays 2019; 41:e1900066. [PMID: 31544971 DOI: 10.1002/bies.201900066] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 08/07/2019] [Indexed: 12/19/2022]
Abstract
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
Collapse
Affiliation(s)
- Klas Hatje
- Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstr. 124, 4070, Basel, Switzerland
| | - Stefanie Mühlhausen
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Dominic Simm
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.,Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| |
Collapse
|
32
|
Hai L, Szwarc MM, Lonard DM, Rajapakshe K, Perera D, Coarfa C, Ittmann M, Fernandez-Valdivia R, Lydon JP. Short-term RANKL exposure initiates a neoplastic transcriptional program in the basal epithelium of the murine salivary gland. Cytokine 2019; 123:154745. [PMID: 31226438 DOI: 10.1016/j.cyto.2019.154745] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 06/05/2019] [Indexed: 12/14/2022]
Abstract
Although salivary gland cancers comprise only ∼3-6% of head and neck cancers, treatment options for patients with advanced-stage disease are limited. Because of their rarity, salivary gland malignancies are understudied compared to other exocrine tissue cancers. The comparative lack of progress in this cancer field is particularly evident when it comes to our incomplete understanding of the key molecular signals that are causal for the development and/or progression of salivary gland cancers. Using a novel conditional transgenic mouse (K5:RANKL), we demonstrate that Receptor Activator of NFkB Ligand (RANKL) targeted to cytokeratin 5-positive basal epithelial cells of the salivary gland causes aggressive tumorigenesis within a short period of RANKL exposure. Genome-wide transcriptomic analysis reveals that RANKL markedly increases the expression levels of numerous gene families involved in cellular proliferation, migration, and intra- and extra-tumoral communication. Importantly, cross-species comparison of the K5:RANKL transcriptomic dataset with The Cancer Genome Atlas cancer signatures reveals the strongest molecular similarity with cancer subtypes of the human head and neck squamous cell carcinoma. These studies not only provide a much needed transcriptomic resource to mine for novel molecular targets for therapy and/or diagnosis but validates the K5:RANKL transgenic as a preclinical model to further investigate the in vivo oncogenic role of RANKL signaling in salivary gland tumorigenesis.
Collapse
Affiliation(s)
- Lan Hai
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA; Reproductive Medicine Center of Henan Provincial People's Hospital, Zhengzhou, Henan Province, PR China
| | - Maria M Szwarc
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - David M Lonard
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Kimal Rajapakshe
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Dimuthu Perera
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Michael Ittmann
- Department of Pathology, Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | | | - John P Lydon
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
33
|
Jones MB, Alvarez CA, Johnson JL, Zhou JY, Morris N, Cobb BA. CD45Rb-low effector T cells require IL-4 to induce IL-10 in FoxP3 Tregs and to protect mice from inflammation. PLoS One 2019; 14:e0216893. [PMID: 31120919 PMCID: PMC6533033 DOI: 10.1371/journal.pone.0216893] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 04/30/2019] [Indexed: 01/04/2023] Open
Abstract
CD4+ effector/memory T cells (Tem) represent a leading edge of the adaptive immune system responsible for protecting the body from infection, cancer, and other damaging processes. However, a subset of Tem cells with low expression of CD45Rb (RbLoTem) has been shown to suppress inflammation despite their effector surface phenotype and the lack of FoxP3 expression, the canonical transcription factor found in most regulatory T cells. In this report, we show that RbLoTem cells can suppress inflammation by influencing Treg behavior. Co-culturing activated RbLoTem and Tregs induced high expression of IL-10 in vitro, and conditioned media from RbLoTem cells induced IL-10 expression in FoxP3+ Tregs in vitro and in vivo, indicating that RbLoTem cells communicate with Tregs in a cell-contact independent fashion. Transcriptomic and multi-analyte Luminex data identified both IL-2 and IL-4 as potential mediators of RbLoTem-Treg communication, and antibody-mediated neutralization of either IL-4 or CD124 (IL-4Rα) prevented IL-10 induction in Tregs. Moreover, isolated Tregs cultured with recombinant IL-2 and IL-4 strongly induced IL-10 production. Using house dust mite (HDM)-induced airway inflammation as a model, we confirmed that the in vivo suppressive activity of RbLoTem cells was lost in IL-4-ablated RbLoTem cells. These data support a model in which RbLoTem cells communicate with Tregs using a combination of IL-2 and IL-4 to induce robust expression of IL-10 and suppression of inflammation.
Collapse
Affiliation(s)
- Mark B. Jones
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, Ohio, United States of America
| | - Carlos A. Alvarez
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, Ohio, United States of America
| | - Jenny L. Johnson
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, Ohio, United States of America
| | - Julie Y. Zhou
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, Ohio, United States of America
| | - Nathan Morris
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, United States of America
| | - Brian A. Cobb
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, Ohio, United States of America
- * E-mail:
| |
Collapse
|
34
|
Boivin V, Faucher-Giguère L, Scott M, Abou-Elela S. The cellular landscape of mid-size noncoding RNA. WILEY INTERDISCIPLINARY REVIEWS-RNA 2019; 10:e1530. [PMID: 30843375 PMCID: PMC6619189 DOI: 10.1002/wrna.1530] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 02/08/2019] [Accepted: 02/09/2019] [Indexed: 01/06/2023]
Abstract
Noncoding RNA plays an important role in all aspects of the cellular life cycle, from the very basic process of protein synthesis to specialized roles in cell development and differentiation. However, many noncoding RNAs remain uncharacterized and the function of most of them remains unknown. Mid-size noncoding RNAs (mncRNAs), which range in length from 50 to 400 nucleotides, have diverse regulatory functions but share many fundamental characteristics. Most mncRNAs are produced from independent promoters although others are produced from the introns of other genes. Many are found in multiple copies in genomes. mncRNAs are highly structured and carry many posttranscriptional modifications. Both of these facets dictate their RNA-binding protein partners and ultimately their function. mncRNAs have already been implicated in translation, catalysis, as guides for RNA modification, as spliceosome components and regulatory RNA. However, recent studies are adding new mncRNA functions including regulation of gene expression and alternative splicing. In this review, we describe the different classes, characteristics and emerging functions of mncRNAs and their relative expression patterns. Finally, we provide a portrait of the challenges facing their detection and annotation in databases. This article is categorized under: Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs RNA Structure and Dynamics > RNA Structure, Dynamics, and Chemistry RNA Structure and Dynamics > Influence of RNA Structure in Biological Systems RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Vincent Boivin
- Department of Biochemistry, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Laurence Faucher-Giguère
- Department of Microbiology and Infectious Disease, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Michelle Scott
- Department of Biochemistry, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Sherif Abou-Elela
- Department of Microbiology and Infectious Disease, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
35
|
Corti A, Sota R, Dugo M, Calogero RA, Terragni B, Mantegazza M, Franceschetti S, Restelli M, Gasparini P, Lecis D, Chrzanowska KH, Delia D. DNA damage and transcriptional regulation in iPSC-derived neurons from Ataxia Telangiectasia patients. Sci Rep 2019; 9:651. [PMID: 30679601 PMCID: PMC6346060 DOI: 10.1038/s41598-018-36912-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 11/23/2018] [Indexed: 11/22/2022] Open
Abstract
Ataxia Telangiectasia (A-T) is neurodegenerative syndrome caused by inherited mutations inactivating the ATM kinase, a master regulator of the DNA damage response (DDR). What makes neurons vulnerable to ATM loss remains unclear. In this study we assessed on human iPSC-derived neurons whether the abnormal accumulation of DNA-Topoisomerase 1 adducts (Top1ccs) found in A-T impairs transcription elongation, thus favoring neurodegeneration. Furthermore, whether neuronal activity-induced immediate early genes (IEGs), a process involving the formation of DNA breaks, is affected by ATM deficiency. We found that Top1cc trapping by CPT induces an ATM-dependent DDR as well as an ATM-independent induction of IEGs and repression especially of long genes. As revealed by nascent RNA sequencing, transcriptional elongation and recovery were found to proceed with the same rate, irrespective of gene length and ATM status. Neuronal activity induced by glutamate receptors stimulation, or membrane depolarization with KCl, triggered a DDR and expression of IEGs, the latter independent of ATM. In unperturbed A-T neurons a set of genes (FN1, DCN, RASGRF1, FZD1, EOMES, SHH, NR2E1) implicated in the development, maintenance and physiology of central nervous system was specifically downregulated, underscoring their potential involvement in the neurodegenerative process in A-T patients.
Collapse
Affiliation(s)
- Alessandro Corti
- Department of Research, Fondazione IRCCS Istituto Nazionale Tumori, Milano, Via Amadeo 42, 20133, Milano, Italy
| | - Raina Sota
- Department of Research, Fondazione IRCCS Istituto Nazionale Tumori, Milano, Via Amadeo 42, 20133, Milano, Italy
| | - Matteo Dugo
- Department of Applied Research and Technological Development, Fondazione IRCCS Istituto Nazionale Tumori, Via Amadeo 42, 20133, Milano, Italy
| | - Raffaele A Calogero
- Universita' degli Studi di Torino, Bioinformatics and Genomics Unit, Molecular Biotechnology Centre, Via Nizza 52, 10126, Torino, Italy
| | - Benedetta Terragni
- Fondazione IRCCS Istituto Neurologico Carlo Besta, Department of Neurophysiopathology and Diagnostic Epileptology, Via Celoria 11, 20133, Milano, Italy
| | - Massimo Mantegazza
- Institute of Molecular and Cellular Pharmacology (IPMC) LabEx ICST, CNRS UMR7275, Route des Lucioles, 06560, Valbonne, Sophia Antipolis, France.,University Côte d'Azur, 660 Route des Lucioles, 06560, Valbonne, Sophia Antipolis, France
| | - Silvana Franceschetti
- Fondazione IRCCS Istituto Neurologico Carlo Besta, Department of Neurophysiopathology and Diagnostic Epileptology, Via Celoria 11, 20133, Milano, Italy
| | - Michela Restelli
- Department of Research, Fondazione IRCCS Istituto Nazionale Tumori, Via Amadeo 42, 20133, Milano, Italy
| | - Patrizia Gasparini
- Department of Research, Fondazione IRCCS Istituto Nazionale Tumori, Milano, Via G Venezian 1, 20133, Milano, Italy
| | - Daniele Lecis
- Department of Research, Fondazione IRCCS Istituto Nazionale Tumori, Milano, Via Amadeo 42, 20133, Milano, Italy
| | - Krystyna H Chrzanowska
- Department of Medical Genetics, The Children's Memorial Health Institute, Al. Dzieci Polskich 20, 04-730, Warsaw, Poland
| | - Domenico Delia
- Department of Research, Fondazione IRCCS Istituto Nazionale Tumori, Milano, Via Amadeo 42, 20133, Milano, Italy. .,IFOM, FIRC Institute of Molecular Oncology, Via Adamello 16, 20139, Milano, Italy.
| |
Collapse
|
36
|
Abstract
Whole genome sequencing (WGS) can provide comprehensive insights into the genetic makeup of lymphomas. Here we describe a selection of methods for the analysis of WGS data, including alignment, identification of different classes of genomic variants, the identification of driver mutations, and the identification of mutational signatures. We further outline design considerations for WGS studies and provide a variety of quality control measures to detect common quality problems in the data.
Collapse
|
37
|
Wang L, Felts SJ, Van Keulen VP, Pease LR, Zhang Y. Exploring the effect of library preparation on RNA sequencing experiments. Genomics 2018; 111:1752-1759. [PMID: 30529531 DOI: 10.1016/j.ygeno.2018.11.030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Revised: 11/29/2018] [Accepted: 11/30/2018] [Indexed: 10/27/2022]
Abstract
RNA sequencing (RNA-seq) has become the widely preferred choice for surveying the genome-wide transcriptome complexity in many organisms. However, the broad adaptation of this methodology into the clinic still needs further evaluation of potential effect of sample preparation factors on its analytical reliability using patient samples. In this study, we examined the impact of three major sample preparation factors (i.e., cDNA library storage time, the quantity of input RNA, and cryopreservation of cell samples) on sequence biases, gene expression profiles, and enriched biological functions using RNAs isolated from primary B cell and CD4+ cell blood samples of healthy subjects. Our comprehensive comparison results suggested that different cDNA library storage time, quantity of input RNA, and cryopreservation of cell samples did not significantly alter gene transcriptional expression profiles generated by RNA-seq experiments. These findings shed new lights on the potential applications of RNA-seq technique to patient samples in a regular clinical setting.
Collapse
Affiliation(s)
- Lei Wang
- Division of Biostatistics and Bioinformatics, University of Maryland Greenebaum Comprehensive Cancer Center, Baltimore, MD 21201, United States; Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD 21201, United States.
| | - Sara J Felts
- Department of Immunology, Mayo Clinic College of Medicine and Science, Rochester, MN 55905, United States.
| | - Virginia P Van Keulen
- Department of Immunology, Mayo Clinic College of Medicine and Science, Rochester, MN 55905, United States.
| | - Larry R Pease
- Department of Immunology, Mayo Clinic College of Medicine and Science, Rochester, MN 55905, United States.
| | - Yuji Zhang
- Division of Biostatistics and Bioinformatics, University of Maryland Greenebaum Comprehensive Cancer Center, Baltimore, MD 21201, United States; Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD 21201, United States.
| |
Collapse
|
38
|
Bacher U, Shumilov E, Flach J, Porret N, Joncourt R, Wiedemann G, Fiedler M, Novak U, Amstutz U, Pabst T. Challenges in the introduction of next-generation sequencing (NGS) for diagnostics of myeloid malignancies into clinical routine use. Blood Cancer J 2018; 8:113. [PMID: 30420667 PMCID: PMC6232163 DOI: 10.1038/s41408-018-0148-6] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 09/17/2018] [Accepted: 10/15/2018] [Indexed: 12/20/2022] Open
Abstract
Given the vast phenotypic and genetic heterogeneity of acute and chronic myeloid malignancies, hematologists have eagerly awaited the introduction of next-generation sequencing (NGS) into the routine diagnostic armamentarium to enable a more differentiated disease classification, risk stratification, and improved therapeutic decisions. At present, an increasing number of hematologic laboratories are in the process of integrating NGS procedures into the diagnostic algorithms of patients with acute myeloid leukemia (AML), myelodysplastic syndromes (MDS), and myeloproliferative neoplasms (MPNs). Inevitably accompanying such developments, physicians and molecular biologists are facing unexpected challenges regarding the interpretation and implementation of molecular genetic results derived from NGS in myeloid malignancies. This article summarizes typical challenges that may arise in the context of NGS-based analyses at diagnosis and during follow-up of myeloid malignancies.
Collapse
Affiliation(s)
- Ulrike Bacher
- Department of Hematology and Central Hematology Laboratory, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
- Center for Laboratory Medicine (ZLM)/University Institute of Clinical Chemistry, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| | - Evgenii Shumilov
- Department of Hematology and Medical Oncology, University Medicine Göttingen (UMG), Göttingen, Germany
| | - Johanna Flach
- Department of Hematology and Oncology, Medical Faculty Mannheim of the Heidelberg University, Mannheim, Germany
| | - Naomi Porret
- Department of Hematology and Central Hematology Laboratory, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Raphael Joncourt
- Department of Hematology and Central Hematology Laboratory, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Gertrud Wiedemann
- Department of Hematology and Central Hematology Laboratory, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Martin Fiedler
- Center for Laboratory Medicine (ZLM)/University Institute of Clinical Chemistry, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Urban Novak
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Ursula Amstutz
- Center for Laboratory Medicine (ZLM)/University Institute of Clinical Chemistry, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Thomas Pabst
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| |
Collapse
|
39
|
Xiong Y, Deng Y, Wang K, Zhou H, Zheng X, Si L, Fu Z. Profiles of alternative splicing in colorectal cancer and their clinical significance: A study based on large-scale sequencing data. EBioMedicine 2018; 36:183-195. [PMID: 30243491 PMCID: PMC6197784 DOI: 10.1016/j.ebiom.2018.09.021] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 09/12/2018] [Accepted: 09/12/2018] [Indexed: 02/07/2023] Open
Abstract
Background Alternative splicing (AS), as a potent and pervasive mechanism of transcriptional regulatory, expands the genome's coding capacity and involves in the initiation and progression of cancer. Systematic analysis of alternative splicing in colorectal cancer (CRC) is lacking and greatly needed. Methods RNA-Seq data and corresponding clinical information of CRC cohort were downloaded from the TCGA data portal. Then, a java application, known as SpliceSeq, was used to evaluate the RNA splicing patterns and calculate the Percent Spliced In (PSI) value. Differently expressed AS events (DEAS) were identified based on PSI value between paired CRC and adjacent tissues. DEAS and its splicing networks were further analyzed by bioinformatics methods. Kaplan-Meier, Cox proportional regression and unsupervised clustering analysis were used to evaluate the association between DEAS and patients' clinical features. Results After strict filtering, a total of 34,334 AS events were identified, among which 421 AS events were found expressed differently. Parent genes of these DEAS play a important role in regulating CRC-related processes such as protein kinase activity (FDR<0.0001), PI3K-Akt signaling pathway (FDR = 0.0024) and p53 signaling pathway (FDR = 0.0143). 37 DEAS events were found to be associated with OS, and 68 DEAS events were found to be associated with DFS. Stratifying patients according to the PSI value of AT in CXCL12 and RI in CSTF3 formed significant Kaplan-Meier curves in both OS and DFS survival analysis. Unsupervised clustering analysis using DEAS revealed four clusters with distinct survival patterns, and associated with consensus molecular subtypes. Conclusions Large differences of AS events in CRC appear to exist, and these differences are likely to be important determinants of both prognosis and biological regulation. Our identified CRC-related AS events and uncovered splicing networks are valuable in deciphering the underlying mechanisms of AS in CRC, and provide clues of therapeutic targets to further validations.
Collapse
Affiliation(s)
- Yongfu Xiong
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Ying Deng
- Department of Cardiovascular, The First Branch, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Kang Wang
- Department of Breast Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - He Zhou
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China; Central Laboratory, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xiangru Zheng
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China; Central Laboratory, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Liangyi Si
- Department of Cardiovascular, The Third Affiliated Hospital of Chongqing Medical University, Chongqing, China.
| | - Zhongxue Fu
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
| |
Collapse
|
40
|
Butkiewicz M, Blue EE, Leung YY, Jian X, Marcora E, Renton AE, Kuzma A, Wang LS, Koboldt DC, Haines JL, Bush WS. Functional annotation of genomic variants in studies of late-onset Alzheimer's disease. Bioinformatics 2018; 34:2724-2731. [PMID: 29590295 PMCID: PMC6084586 DOI: 10.1093/bioinformatics/bty177] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 03/17/2018] [Accepted: 03/23/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation Annotation of genomic variants is an increasingly important and complex part of the analysis of sequence-based genomic analyses. Computational predictions of variant function are routinely incorporated into gene-based analyses of rare-variants, though to date most studies use limited information for assessing variant function that is often agnostic of the disease being studied. Results In this work, we outline an annotation process motivated by the Alzheimer's Disease Sequencing Project, illustrate the impact of including tissue-specific transcript sets and sources of gene regulatory information and assess the potential impact of changing genomic builds on the annotation process. While these factors only impact a small proportion of total variant annotations (∼5%), they influence the potential analysis of a large fraction of genes (∼25%). Availability and implementation Individual variant annotations are available via the NIAGADS GenomicsDB, at https://www.niagads.org/genomics/ tools-and-software/databases/genomics-database. Annotations are also available for bulk download at https://www.niagads.org/datasets. Annotation processing software is available at http://www.icompbio.net/resources/software-and-downloads/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mariusz Butkiewicz
- Department of Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Elizabeth E Blue
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Xueqiu Jian
- Division of Epidemiology, Human Genetics and Environmental Sciences, University of Texas Health Science Center, Houston, TX, USA
| | - Edoardo Marcora
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alan E Renton
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Amanda Kuzma
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Jonathan L Haines
- Department of Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - William S Bush
- Department of Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
41
|
Matataki: an ultrafast mRNA quantification method for large-scale reanalysis of RNA-Seq data. BMC Bioinformatics 2018; 19:266. [PMID: 30012088 PMCID: PMC6048772 DOI: 10.1186/s12859-018-2279-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 07/09/2018] [Indexed: 02/03/2023] Open
Abstract
Background Data generated by RNA sequencing (RNA-Seq) is now accumulating in vast amounts in public repositories, especially for human and mouse genomes. Reanalyzing these data has emerged as a promising approach to identify gene modules or pathways. Although meta-analyses of gene expression data are frequently performed using microarray data, meta-analyses using RNA-Seq data are still rare. This lag is partly due to the limitations in reanalyzing RNA-Seq data, which requires extensive computational resources. Moreover, it is nearly impossible to calculate the gene expression levels of all samples in a public repository using currently available methods. Here, we propose a novel method, Matataki, for rapidly estimating gene expression levels from RNA-Seq data. Results The proposed method uses k-mers that are unique to each gene for the mapping of fragments to genes. Since aligning fragments to reference sequences requires high computational costs, our method could reduce the calculation cost by focusing on k-mers that are unique to each gene and by skipping uninformative regions. Indeed, Matataki outperformed conventional methods with regards to speed while demonstrating sufficient accuracy. Conclusions The development of Matataki can overcome current limitations in reanalyzing RNA-Seq data toward improving the potential for discovering genes and pathways associated with disease at reduced computational cost. Thus, the main bottleneck of RNA-Seq analyses has shifted to achieving the decompression of sequenced data. The implementation of Matataki is available at https://github.com/informationsea/Matataki. Electronic supplementary material The online version of this article (10.1186/s12859-018-2279-y) contains supplementary material, which is available to authorized users.
Collapse
|
42
|
Latgé G, Poulet C, Bours V, Josse C, Jerusalem G. Natural Antisense Transcripts: Molecular Mechanisms and Implications in Breast Cancers. Int J Mol Sci 2018; 19:ijms19010123. [PMID: 29301303 PMCID: PMC5796072 DOI: 10.3390/ijms19010123] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 12/07/2017] [Accepted: 12/29/2017] [Indexed: 12/13/2022] Open
Abstract
Natural antisense transcripts are RNA sequences that can be transcribed from both DNA strands at the same locus but in the opposite direction from the gene transcript. Because strand-specific high-throughput sequencing of the antisense transcriptome has only been available for less than a decade, many natural antisense transcripts were first described as long non-coding RNAs. Although the precise biological roles of natural antisense transcripts are not known yet, an increasing number of studies report their implication in gene expression regulation. Their expression levels are altered in many physiological and pathological conditions, including breast cancers. Among the potential clinical utilities of the natural antisense transcripts, the non-coding|coding transcript pairs are of high interest for treatment. Indeed, these pairs can be targeted by antisense oligonucleotides to specifically tune the expression of the coding-gene. Here, we describe the current knowledge about natural antisense transcripts, their varying molecular mechanisms as gene expression regulators, and their potential as prognostic or predictive biomarkers in breast cancers.
Collapse
Affiliation(s)
- Guillaume Latgé
- Laboratory of Human Genetics, GIGA-Institute, University of Liège, 4500 Liège, Belgium.
| | - Christophe Poulet
- Laboratory of Human Genetics, GIGA-Institute, University of Liège, 4500 Liège, Belgium.
| | - Vincent Bours
- Laboratory of Human Genetics, GIGA-Institute, University of Liège, 4500 Liège, Belgium.
- Center of Genetics, University Hospital (CHU), 4500 Liège, Belgium.
| | - Claire Josse
- Laboratory of Human Genetics, GIGA-Institute, University of Liège, 4500 Liège, Belgium.
- Department of Medical Oncology, University Hospital (CHU), 4500 Liège, Belgium.
- Laboratory of Medical Oncology, GIGA-Institute, University of Liège, 4500 Liège, Belgium.
| | - Guy Jerusalem
- Department of Medical Oncology, University Hospital (CHU), 4500 Liège, Belgium.
- Laboratory of Medical Oncology, GIGA-Institute, University of Liège, 4500 Liège, Belgium.
| |
Collapse
|
43
|
Wright GEB, Carleton B, Hayden MR, Ross CJD. The global spectrum of protein-coding pharmacogenomic diversity. THE PHARMACOGENOMICS JOURNAL 2018; 18:187-195. [PMID: 27779249 PMCID: PMC5817389 DOI: 10.1038/tpj.2016.77] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 06/22/2016] [Accepted: 08/25/2016] [Indexed: 12/23/2022]
Abstract
Differences in response to medications have a strong genetic component. By leveraging publically available data, the spectrum of such genomic variation can be investigated extensively. Pharmacogenomic variation was extracted from the 1000 Genomes Project Phase 3 data (2504 individuals, 26 global populations). A total of 12 084 genetic variants were found in 120 pharmacogenes, with the majority (90.0%) classified as rare variants (global minor allele frequency <0.5%), with 52.9% being singletons. Common variation clustered individuals into continental super-populations and 23 pharmacogenes contained highly differentiated variants (FST>0.5) for one or more super-population comparison. A median of three clinical variants (PharmGKB level 1A/B) was found per individual, and 55.4% of individuals carried loss-of-function variants, varying by super-population (East Asian 60.9%>African 60.1%>South Asian 60.3%>European 49.3%>Admixed 39.2%). Genome sequencing can therefore identify clinical pharmacogenomic variation, and future studies need to consider rare variation to understand the spectrum of genetic diversity contributing to drug response.
Collapse
Affiliation(s)
- G E B Wright
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
| | - B Carleton
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
- Division of Translational Therapeutics, Department of Pediatrics, University of British Columbia, Vancouver, British Columbia, Canada
| | - M R Hayden
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
| | - C J D Ross
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, Canada
| |
Collapse
|
44
|
Abstract
The rapid development of immunomodulatory cancer therapies has led to a concurrent increase in the application of informatics techniques to the analysis of tumors, the tumor microenvironment, and measures of systemic immunity. In this review, the use of tumors to gather genetic and expression data will first be explored. Next, techniques to assess tumor immunity are reviewed, including HLA status, predicted neoantigens, immune microenvironment deconvolution, and T-cell receptor sequencing. Attempts to integrate these data are in early stages of development and are discussed in this review. Finally, we review the application of these informatics strategies to therapy development, with a focus on vaccines, adoptive cell transfer, and checkpoint blockade therapies.
Collapse
Affiliation(s)
- J Hammerbacher
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York
- Department of Microbiology and Immunology, Medical University of South Carolina, Charleston
| | - A Snyder
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York
- Adaptive Biotechnologies, Seattle, USA
| |
Collapse
|
45
|
Penalva LO, Sanford JR. From mechanisms to therapy: RNA processing's impact on human genetics. Hum Genet 2017; 136:1013-1014. [PMID: 28866814 DOI: 10.1007/s00439-017-1841-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- Luiz O Penalva
- Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA
| | - Jeremy R Sanford
- Department of Molecular, Cellular and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, 95060, USA.
| |
Collapse
|
46
|
Steward CA, Parker APJ, Minassian BA, Sisodiya SM, Frankish A, Harrow J. Genome annotation for clinical genomic diagnostics: strengths and weaknesses. Genome Med 2017; 9:49. [PMID: 28558813 PMCID: PMC5448149 DOI: 10.1186/s13073-017-0441-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The Human Genome Project and advances in DNA sequencing technologies have revolutionized the identification of genetic disorders through the use of clinical exome sequencing. However, in a considerable number of patients, the genetic basis remains unclear. As clinicians begin to consider whole-genome sequencing, an understanding of the processes and tools involved and the factors to consider in the annotation of the structure and function of genomic elements that might influence variant identification is crucial. Here, we discuss and illustrate the strengths and weaknesses of approaches for the annotation and classification of important elements of protein-coding genes, other genomic elements such as pseudogenes and the non-coding genome, comparative-genomic approaches for inferring gene function, and new technologies for aiding genome annotation, as a practical guide for clinicians when considering pathogenic sequence variation. Complete and accurate annotation of structure and function of genome features has the potential to reduce both false-negative (from missing annotation) and false-positive (from incorrect annotation) errors in causal variant identification in exome and genome sequences. Re-analysis of unsolved cases will be necessary as newer technology improves genome annotation, potentially improving the rate of diagnosis.
Collapse
Affiliation(s)
- Charles A Steward
- Congenica Ltd, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR, UK. .,The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | | - Berge A Minassian
- Department of Pediatrics (Neurology), University of Texas Southwestern, Dallas, TX, USA.,Program in Genetics and Genome Biology and Department of Paediatrics (Neurology), The Hospital for Sick Children and University of Toronto, Toronto, Canada
| | - Sanjay M Sisodiya
- Department of Clinical and Experimental Epilepsy, UCL Institute of Neurology, London, WC1N 3BG, UK.,Chalfont Centre for Epilepsy, Chesham Lane, Chalfont St Peter, Buckinghamshire, SL9 0RJ, UK
| | - Adam Frankish
- The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jennifer Harrow
- The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Illumina Inc, Great Chesterford, Essex, CB10 1XL, UK
| |
Collapse
|
47
|
Majoros WH, Campbell MS, Holt C, DeNardo EK, Ware D, Allen AS, Yandell M, Reddy TE. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE. Bioinformatics 2017; 33:1437-1446. [PMID: 28011790 DOI: 10.1093/bioinformatics/btw799] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 12/13/2016] [Indexed: 11/12/2022] Open
Abstract
Motivation The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. Results We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. Availability and Implementation ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. Contact myandell@genetics.utah.edu or tim.reddy@duke.edu. Supplementary information Supplementary information is available at Bioinformatics online.
Collapse
Affiliation(s)
- William H Majoros
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, USA.,Center for Genomic and Computational Biology, Duke University Medical School, Durham, NC, USA
| | | | - Carson Holt
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah and School of Medicine, Salt Lake City, UT, USA
| | - Erin K DeNardo
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,USDA ARS NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY, USA
| | - Andrew S Allen
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, USA.,Department of Biostatistics and Bioinformatics, Duke University Medical School, Durham, NC, USA
| | - Mark Yandell
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah and School of Medicine, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Timothy E Reddy
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, USA.,Center for Genomic and Computational Biology, Duke University Medical School, Durham, NC, USA.,Department of Biostatistics and Bioinformatics, Duke University Medical School, Durham, NC, USA
| |
Collapse
|
48
|
Johansson BB, Irgens HU, Molnes J, Sztromwasser P, Aukrust I, Juliusson PB, Søvik O, Levy S, Skrivarhaug T, Joner G, Molven A, Johansson S, Njølstad PR. Targeted next-generation sequencing reveals MODY in up to 6.5% of antibody-negative diabetes cases listed in the Norwegian Childhood Diabetes Registry. Diabetologia 2017; 60:625-635. [PMID: 27913849 DOI: 10.1007/s00125-016-4167-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 11/09/2016] [Indexed: 12/18/2022]
Abstract
AIMS/HYPOTHESIS MODY can be wrongly diagnosed as type 1 diabetes in children. We aimed to find the prevalence of MODY in a nationwide population-based registry of childhood diabetes. METHODS Using next-generation sequencing, we screened the HNF1A, HNF4A, HNF1B, GCK and INS genes in all 469 children (12.1%) negative for both GAD and IA-2 autoantibodies and 469 antibody-positive matched controls selected from the Norwegian Childhood Diabetes Registry (3882 children). Variants were classified using clinical diagnostic criteria for pathogenicity ranging from class 1 (neutral) to class 5 (pathogenic). RESULTS We identified 58 rare exonic and splice variants in cases and controls. Among antibody-negative patients, 6.5% had genetic variants of classes 3-5 (vs 2.4% in controls; p = 0.002). For the stricter classification (classes 4 and 5), the corresponding number was 4.1% (vs 0.2% in controls; p = 1.6 × 10-5). HNF1A showed the strongest enrichment of class 3-5 variants, with 3.9% among antibody-negative patients (vs 0.4% in controls; p = 0.0002). Antibody-negative carriers of variants in class 3 had a similar phenotype to those carrying variants in classes 4 and 5. CONCLUSIONS/INTERPRETATION This is the first study screening for MODY in all antibody-negative children in a nationwide population-based registry. Our results suggest that the prevalence of MODY in antibody-negative childhood diabetes may reach 6.5%. One-third of these MODY cases had not been recognised by clinicians. Since a precise diagnosis is important for treatment and genetic counselling, molecular screening of all antibody-negative children should be considered in routine diagnostics.
Collapse
Affiliation(s)
- Bente B Johansson
- K. G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5020, Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
| | - Henrik U Irgens
- K. G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5020, Bergen, Norway
- Department of Paediatrics, Haukeland University Hospital, Bergen, Norway
| | - Janne Molnes
- K. G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5020, Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
| | - Paweł Sztromwasser
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Ingvild Aukrust
- K. G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5020, Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
| | - Petur B Juliusson
- Department of Paediatrics, Haukeland University Hospital, Bergen, Norway
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Oddmund Søvik
- K. G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5020, Bergen, Norway
- Department of Paediatrics, Haukeland University Hospital, Bergen, Norway
| | - Shawn Levy
- Hudson Alpha Institute for Biotechnology, Huntsville, AL, USA
| | - Torild Skrivarhaug
- Division of Paediatric and Adolescent Medicine, Oslo University Hospital, Oslo, Norway
| | - Geir Joner
- Division of Paediatric and Adolescent Medicine, Oslo University Hospital, Oslo, Norway
- Institute of Health and Society, University of Oslo, Oslo, Norway
| | - Anders Molven
- K. G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5020, Bergen, Norway
- Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway
- Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Stefan Johansson
- K. G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5020, Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
| | - Pål R Njølstad
- K. G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5020, Bergen, Norway.
- Department of Paediatrics, Haukeland University Hospital, Bergen, Norway.
| |
Collapse
|
49
|
Piovesan A, Caracausi M, Antonaros F, Pelleri MC, Vitale L. GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw153. [PMID: 28025344 PMCID: PMC5199132 DOI: 10.1093/database/baw153] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Revised: 10/28/2016] [Accepted: 10/31/2016] [Indexed: 11/25/2022]
Abstract
We release GeneBase 1.1, a local tool with a graphical interface useful for parsing, structuring and indexing data from the National Center for Biotechnology Information (NCBI) Gene data bank. Compared to its predecessor GeneBase (1.0), GeneBase 1.1 now allows dynamic calculation and summarization in terms of median, mean, standard deviation and total for many quantitative parameters associated with genes, gene transcripts and gene features (exons, introns, coding sequences, untranslated regions). GeneBase 1.1 thus offers the opportunity to perform analyses of the main gene structure parameters also following the search for any set of genes with the desired characteristics, allowing unique functionalities not provided by the NCBI Gene itself. In order to show the potential of our tool for local parsing, structuring and dynamic summarizing of publicly available databases for data retrieval, analysis and testing of biological hypotheses, we provide as a sample application a revised set of statistics for human nuclear genes, gene transcripts and gene features. In contrast with previous estimations strongly underestimating the length of human genes, a ‘mean’ human protein-coding gene is 67 kbp long, has eleven 309 bp long exons and ten 6355 bp long introns. Median, mean and extreme values are provided for many other features offering an updated reference source for human genome studies, data useful to set parameters for bioinformatic tools and interesting clues to the biomedical meaning of the gene features themselves. Database URL: http://apollo11.isto.unibo.it/software/
Collapse
Affiliation(s)
- Allison Piovesan
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126 Bologna, Italy
| | - Maria Caracausi
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126 Bologna, Italy
| | - Francesca Antonaros
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126 Bologna, Italy
| | - Maria Chiara Pelleri
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126 Bologna, Italy
| | - Lorenza Vitale
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126 Bologna, Italy
| |
Collapse
|
50
|
Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Juettemann T, Keenan S, Laird MR, Lavidas I, Maurel T, McLaren W, Moore B, Murphy DN, Nag R, Newman V, Nuhn M, Ong CK, Parker A, Patricio M, Riat HS, Sheppard D, Sparrow H, Taylor K, Thormann A, Vullo A, Walts B, Wilder SP, Zadissa A, Kostadima M, Martin FJ, Muffato M, Perry E, Ruffier M, Staines DM, Trevanion SJ, Cunningham F, Yates A, Zerbino DR, Flicek P. Ensembl 2017. Nucleic Acids Res 2016; 45:D635-D642. [PMID: 27899575 PMCID: PMC5210575 DOI: 10.1093/nar/gkw1104] [Citation(s) in RCA: 409] [Impact Index Per Article: 51.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Revised: 10/25/2016] [Accepted: 10/28/2016] [Indexed: 12/12/2022] Open
Abstract
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Collapse
Affiliation(s)
- Bronwen L Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Premanand Achuthan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Wasiu Akanni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Friederike Bernsdorff
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Denise Carvalho-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Clapham
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laurent Gil
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sophie H Janacek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Juettemann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen Keenan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew R Laird
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ilias Lavidas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Maurel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - William McLaren
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel N Murphy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Victoria Newman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michael Nuhn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chuang Kee Ong
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Harpreet Singh Riat
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Sparrow
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anja Thormann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alessandro Vullo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Brandon Walts
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Steven P Wilder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Amonida Zadissa
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Myrto Kostadima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel M Staines
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK .,Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|