1
|
Rich A, Acar O, Carvunis AR. Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome. Genome Biol 2024; 25:183. [PMID: 38978079 PMCID: PMC11232214 DOI: 10.1186/s13059-024-03287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 05/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.
Collapse
Affiliation(s)
- April Rich
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Omer Acar
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
Minkin I, Salzberg SL. CONSERVATION ASSESSMENT OF HUMAN SPLICE SITE ANNOTATION BASED ON A 470-GENOME ALIGNMENT. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.01.569581. [PMID: 38076842 PMCID: PMC10705407 DOI: 10.1101/2023.12.01.569581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
Despite many improvements over the years, the annotation of the human genome remains imperfect, and different annotations of the human reference genome sometimes contradict one another. The use of evolutionarily conserved sequences provides a strategy for selecting a high-confidence subset of the annotation that is more likely to be related to biological functions, and the rapidly growing number of genomes from other species increases its power. Using the latest whole genome alignment, we found that splice sites from protein-coding genes in the high-quality MANE annotation are consistently conserved across more than 400 species. We also studied splice sites from the RefSeq, GENCODE, and CHESS databases that are not present in MANE. We trained a logistic regression classifier to distinguish between the conservation exhibited by sites from MANE versus sites chosen randomly from neutrally evolving sequence. We found that splice sites classified by our model as conserved have lower SNP rates and better transcriptomic support. We then computed a subset of transcripts only using either "conserved" splice sites or ones from MANE. This subset is enriched in high-confidence transcripts of the major gene catalogs that appear to be under purifying selection and are more likely to be correct and functionally relevant.
Collapse
Affiliation(s)
- Ilia Minkin
- Department of Biomedical Engineering, Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Steven L Salzberg
- Department of Biomedical Engineering, Center for Computational Biology, Department of Computer Science, Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| |
Collapse
|
3
|
Buerger F, Salmanullah D, Liang L, Gauntner V, Krueger K, Qi M, Sharma V, Rubin A, Ball D, Lemberg K, Saida K, Merz LM, Sever S, Issac B, Sun L, Guerrero-Castillo S, Gomez AC, McNulty MT, Sampson MG, Al-Hamed MH, Saleh MM, Shalaby M, Kari J, Fawcett JP, Hildebrandt F, Majmundar AJ. Recessive variants in the intergenic NOS1AP-C1orf226 locus cause monogenic kidney disease responsive to anti-proteinuric treatment. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.17.24303374. [PMID: 38562757 PMCID: PMC10984069 DOI: 10.1101/2024.03.17.24303374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
In genetic disease, an accurate expression landscape of disease genes and faithful animal models will enable precise genetic diagnoses and therapeutic discoveries, respectively. We previously discovered that variants in NOS1AP , encoding nitric oxide synthase 1 (NOS1) adaptor protein, cause monogenic nephrotic syndrome (NS). Here, we determined that an intergenic splice product of N OS1AP / Nos1ap and neighboring C1orf226/Gm7694 , which precludes NOS1 binding, is the predominant isoform in mammalian kidney transcriptional and proteomic data. Gm7694 -/- mice, whose allele exclusively disrupts the intergenic product, developed NS phenotypes. In two human NS subjects, we identified causative NOS1AP splice variants, including one predicted to abrogate intergenic splicing but initially misclassified as benign based on the canonical transcript. Finally, by modifying genetic background, we generated a faithful mouse model of NOS1AP -associated NS, which responded to anti-proteinuric treatment. This study highlights the importance of intergenic splicing and a potential treatment avenue in a mendelian disorder.
Collapse
|
4
|
Gibert MK, Zhang Y, Saha S, Marcinkiewicz P, Dube C, Hudson K, Sun Y, Bednarek S, Chagari B, Sarkar A, Roig-Laboy C, Neace N, Saoud K, Setiady I, Hanif F, Schiff D, Kumar P, Kefas B, Hafner M, Abounader R. A first comprehensive analysis of Transcribed Ultra Conserved Regions uncovers important regulatory functions of novel non-coding transcripts in gliomas. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.12.557444. [PMID: 38562826 PMCID: PMC10983853 DOI: 10.1101/2023.09.12.557444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Transcribed Ultra-Conserved Regions (TUCRs) represent a severely understudied class of putative non-coding RNAs (ncRNAs) that are 100% conserved across multiple species. We performed the first-ever analysis of TUCRs in glioblastoma (GBM) and low-grade gliomas (LGG). We leveraged large human datasets to identify the genomic locations, chromatin accessibility, transcription, differential expression, correlation with survival, and predicted functions of all 481 TUCRs, and identified TUCRs that are relevant to glioma biology. Of these, we investigated the expression, function, and mechanism of action of the most highly upregulated intergenic TUCR, uc.110, identifying it as a new oncogene. Uc.110 was highly overexpressed in GBM and LGG, where it promoted malignancy and tumor growth. Uc.110 activated the WNT pathway by upregulating the expression of membrane frizzled-related protein (MFRP), by sponging the tumor suppressor microRNA miR-544. This pioneering study shows important roles for TUCRs in gliomas and provides an extensive database and novel methods for future TUCR research.
Collapse
|
5
|
Einson J, Minaeva M, Rafi F, Lappalainen T. The impact of genetically controlled splicing on exon inclusion and protein structure. PLoS One 2024; 19:e0291960. [PMID: 38478511 PMCID: PMC10936842 DOI: 10.1371/journal.pone.0291960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 09/08/2023] [Indexed: 03/17/2024] Open
Abstract
Common variants affecting mRNA splicing are typically identified though splicing quantitative trait locus (sQTL) mapping and have been shown to be enriched for GWAS signals by a similar degree to eQTLs. However, the specific splicing changes induced by these variants have been difficult to characterize, making it more complicated to analyze the effect size and direction of sQTLs, and to determine downstream splicing effects on protein structure. In this study, we catalogue sQTLs using exon percent spliced in (PSI) scores as a quantitative phenotype. PSI is an interpretable metric for identifying exon skipping events and has some advantages over other methods for quantifying splicing from short read RNA sequencing. In our set of sQTL variants, we find evidence of selective effects based on splicing effect size and effect direction, as well as exon symmetry. Additionally, we utilize AlphaFold2 to predict changes in protein structure associated with sQTLs overlapping GWAS traits, highlighting a potential new use-case for this technology for interpreting genetic effects on traits and disorders.
Collapse
Affiliation(s)
- Jonah Einson
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, United States of America
- New York Genome Center, New York, NY, United States of America
| | - Mariia Minaeva
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Faiza Rafi
- New York Genome Center, New York, NY, United States of America
- Department of Biotechnology, The City College of New York, New York, NY, United States of America
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, United States of America
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, United States of America
| |
Collapse
|
6
|
Luthra I, Jensen C, Chen XE, Salaudeen AL, Rafi AM, de Boer CG. Regulatory activity is the default DNA state in eukaryotes. Nat Struct Mol Biol 2024; 31:559-567. [PMID: 38448573 DOI: 10.1038/s41594-024-01235-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.
Collapse
Affiliation(s)
- Ishika Luthra
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cassandra Jensen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xinyi E Chen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Asfar Lathif Salaudeen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Abdul Muntakim Rafi
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
7
|
He X, Weng Z, Zou Y. Progress in the controllability technology of PROTAC. Eur J Med Chem 2024; 265:116096. [PMID: 38160619 DOI: 10.1016/j.ejmech.2023.116096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/03/2024]
Abstract
Proteolysis-targeting chimaera (PROTAC) technology functions by directly targeting proteins and catalysing their degradation through an event-driven mode of action, a novel mechanism with significant clinical application prospects for various diseases. Currently, the most advanced PROTAC drug is undergoing phase III clinical trials (NCT05654623). Although PROTACs exhibit significant advantages over traditional small-molecule inhibitors, their catalytic degradation of normal cellular proteins can potentially cause toxic side effects. Therefore, to achieve targeted release of PROTACs and minimize adverse reactions, researchers are actively exploring diverse controllable PROTACs. In this review, we comprehensively summarize the control strategies to provide a theoretical basis for the innovative application of PROTAC technology.
Collapse
Affiliation(s)
- Xin He
- School of Chemical and Pharmaceutical Engineering, Changzhou Vocational Institute of Engineering, Changzhou, 213164, PR China.
| | - Zhibing Weng
- School of Chemical and Pharmaceutical Engineering, Changzhou Vocational Institute of Engineering, Changzhou, 213164, PR China
| | - Yi Zou
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, 211198, PR China.
| |
Collapse
|
8
|
Willemin A, Szabó D, Pombo A. Epigenetic regulatory layers in the 3D nucleus. Mol Cell 2024; 84:415-428. [PMID: 38242127 PMCID: PMC10872226 DOI: 10.1016/j.molcel.2023.12.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/21/2023] [Accepted: 12/15/2023] [Indexed: 01/21/2024]
Abstract
Nearly 7 decades have elapsed since Francis Crick introduced the central dogma of molecular biology, as part of his ideas on protein synthesis, setting the fundamental rules of sequence information transfer from DNA to RNAs and proteins. We have since learned that gene expression is finely tuned in time and space, due to the activities of RNAs and proteins on regulatory DNA elements, and through cell-type-specific three-dimensional conformations of the genome. Here, we review major advances in genome biology and discuss a set of ideas on gene regulation and highlight how various biomolecular assemblies lead to the formation of structural and regulatory features within the nucleus, with roles in transcriptional control. We conclude by suggesting further developments that will help capture the complex, dynamic, and often spatially restricted events that govern gene expression in mammalian cells.
Collapse
Affiliation(s)
- Andréa Willemin
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, Berlin, Germany; Humboldt-Universität zu Berlin, Institute for Biology, Berlin, Germany.
| | - Dominik Szabó
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, Berlin, Germany; Humboldt-Universität zu Berlin, Institute for Biology, Berlin, Germany
| | - Ana Pombo
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, Berlin, Germany; Humboldt-Universität zu Berlin, Institute for Biology, Berlin, Germany.
| |
Collapse
|
9
|
Cao X, Sun S, Xing J. A Massive Proteogenomic Screen Identifies Thousands of Novel Peptides From the Human "Dark" Proteome. Mol Cell Proteomics 2024; 23:100719. [PMID: 38242438 PMCID: PMC10867589 DOI: 10.1016/j.mcpro.2024.100719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 01/01/2024] [Accepted: 01/16/2024] [Indexed: 01/21/2024] Open
Abstract
Although the human gene annotation has been continuously improved over the past 2 decades, numerous studies demonstrated the existence of a "dark proteome", consisting of proteins that were critical for biological processes but not included in widely used gene catalogs. The Genotype-Tissue Expression project generated more than 15,000 RNA-seq datasets from multiple tissues, which modeled 30 million transcripts in the human genome. To provide a resource of high-confidence novel proteins from the dark proteome, we screened 50,000 mass spectrometry runs from over 900 projects to identify proteins translated from the Genotype-Tissue Expression transcript model with proteomic support. We also integrated 3.8 million common genetic variants from the gnomAD database to improve peptide identification. As a result, we identified 170,529 novel peptides with proteomic evidence, of which 6048 passed the strictest standard we defined and were supported by PepQuery. We provided a user-friendly website (https://ncorf.genes.fun/) for researchers to check the evidence of novel peptides from their studies. The findings will improve our understanding of coding genes and facilitate genomic data interpretation in biomedical research.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Anesthesiology, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, China; Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Siqi Sun
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA.
| |
Collapse
|
10
|
Song Y, Zhang C, Omenn GS, O’Meara MJ, Welch JD. Predicting the Structural Impact of Human Alternative Splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572928. [PMID: 38187531 PMCID: PMC10769328 DOI: 10.1101/2023.12.21.572928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Protein structure prediction with neural networks is a powerful new method for linking protein sequence, structure, and function, but structures have generally been predicted for only a single isoform of each gene, neglecting splice variants. To investigate the structural implications of alternative splicing, we used AlphaFold2 to predict the structures of more than 11,000 human isoforms. We employed multiple metrics to identify splicing-induced structural alterations, including template matching score, secondary structure composition, surface charge distribution, radius of gyration, accessibility of post-translational modification sites, and structure-based function prediction. We identified examples of how alternative splicing induced clear changes in each of these properties. Structural similarity between isoforms largely correlated with degree of sequence identity, but we identified a subset of isoforms with low structural similarity despite high sequence similarity. Exon skipping and alternative last exons tended to increase the surface charge and radius of gyration. Splicing also buried or exposed numerous post-translational modification sites, most notably among the isoforms of BAX. Functional prediction nominated numerous functional differences among isoforms of the same gene, with loss of function compared to the reference predominating. Finally, we used single-cell RNA-seq data from the Tabula Sapiens to determine the cell types in which each structure is expressed. Our work represents an important resource for studying the structure and function of splice isoforms across the cell types of the human body.
Collapse
Affiliation(s)
- Yuxuan Song
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Matthew J. O’Meara
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Joshua D. Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
11
|
Cheng HF, Tsai YF, Liu CY, Hsu CY, Lien PJ, Lin YS, Chao TC, Lai JI, Feng CJ, Chen YJ, Chen BF, Chiu JH, Tseng LM, Huang CC. Prevalence of BRCA1, BRCA2, and PALB2 genomic alterations among 924 Taiwanese breast cancer assays with tumor-only targeted sequencing: extended data analysis from the VGH-TAYLOR study. Breast Cancer Res 2023; 25:152. [PMID: 38098088 PMCID: PMC10722686 DOI: 10.1186/s13058-023-01751-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 12/05/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND The homologous recombination (HR) repair pathway for DNA damage, particularly the BRCA1 and BRCA2 genes, has become a target for cancer therapy, with poly ADP-ribose polymerase (PARP) inhibitors showing significant outcomes in treating germline BRCA1/2 (gBRCA1/2) mutated breast cancer. Recent studies suggest that some patients with somatic BRCA1/2 (sBRCA1/2) mutation or mutations in HR-related genes other than BRCA1/2 may benefit from PARP inhibitors as well, particularly those with PALB2 mutations. The current analysis aims to evaluate the prevalence of genetic alterations specific to BRCA1, BRCA2, and PALB2 in a large cohort of Taiwanese breast cancer patients through tumor-targeted sequencing. METHODS A total of 924 consecutive assays from 879 Taiwanese breast cancer patients underwent tumor-targeted sequencing (Thermo Fisher Oncomine Comprehensive Assay v3). We evaluated BRCA1, BRCA2, and PALB2 mutational profiles, with variants annotated and curated by the ClinVAR, the Oncomine™ Knowledgebase Reporter, and the OncoKB™. We also conducted reflex germline testing using either whole exome sequencing (WES) or whole genome sequencing (WGS), which is ongoing. RESULTS Among the 879 patients analyzed (924 assays), 130 had positive mutations in BRCA1 (3.1%), BRCA2 (8.6%), and PALB2 (5.2%), with a total of 14.8% having genetic alterations. Co-occurrence was noted between BRCA1/BRCA2, BRCA1/PALB2, and BRCA2/PALB2 mutations. In BRCA1-mutated samples, only p.K654fs was observed in three patients, while other variants were observed no more than twice. For BRCA2, p.N372H was the most common (26 patients), followed by p.S2186fs, p.V2466A, and p.X159_splice (5 times each). For PALB2, p.I887fs was the most common mutation (30 patients). This study identified 176 amino acid changes; 60.2% (106) were not documented in either ClinVAR or the Oncomine™ Knowledgebase Reporter. Using the OncoKB™ for annotation, 171 (97.2%) were found to have clinical implications. For the result of reflex germline testing, three variants (BRCA1 c.1969_1970del, BRCA1 c.3629_3630del, BRCA2 c.8755-1G > C) were annotated as Pathogenic/Likely pathogenic (P/LP) variants by ClinVar and as likely loss-of-function or likely oncogenic by OncoKB; while one variant (PALB2 c.448C > T) was not found in ClinVar but was annotated as likely loss-of-function or likely oncogenic by OncoKB. CONCLUSION Our study depicted the mutational patterns of BRCA1, BRCA2, and PALB2 in Taiwanese breast cancer patients through tumor-only sequencing. This highlights the growing importance of BRCA1/2 and PALB2 alterations in breast cancer susceptibility risk and the treatment of index patients. We also emphasized the need to meticulously annotate variants in cancer-driver genes as well as actionable mutations across multiple databases.
Collapse
Affiliation(s)
- Han-Fang Cheng
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
| | - Yi-Fang Tsai
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
| | - Chun-Yu Liu
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
- Division of Transfusion Medicine, Department of Medicine, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- Division of Medical Oncology, Department of Oncology, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
| | - Chih-Yi Hsu
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
- Department of Pathology and Laboratory Medicine, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
| | - Pei-Ju Lien
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- Department of Nurse, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
| | - Yen-Shu Lin
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
| | - Ta-Chung Chao
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
- Division of Medical Oncology, Department of Oncology, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
| | - Jiun-I Lai
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- Division of Medical Oncology, Department of Oncology, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- Institute of Clinical Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
| | - Chin-Jung Feng
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
- Division of Plastic and Reconstruction Surgery, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
| | - Yen-Jen Chen
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
| | - Bo-Fang Chen
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
| | - Jen-Hwey Chiu
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
- Center for Traditional Medicine, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC
- Institue of Traditional Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC
| | - Ling-Ming Tseng
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC.
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei City, Taiwan, ROC.
- Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC.
| | - Chi-Cheng Huang
- Comprehensive Breast Health Center, Department of Surgery, Taipei Veterans General Hospital, Taipei City, Taiwan, ROC.
- Institute of Epidemiology and Preventive Medicine, College of Medicine, National Taiwan University, Taipei City, Taiwan, ROC.
| |
Collapse
|
12
|
Zhang Q, Shao M. Transcript assembly and annotations: Bias and adjustment. PLoS Comput Biol 2023; 19:e1011734. [PMID: 38127855 PMCID: PMC10769104 DOI: 10.1371/journal.pcbi.1011734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 01/05/2024] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. We investigate the impact of annotations on transcript assembly. Surprisingly, we observe that opposite conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
Collapse
Affiliation(s)
- Qimin Zhang
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Mingfu Shao
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
13
|
Stepankiw N, Yang AWH, Hughes TR. The human genome contains over a million autonomous exons. Genome Res 2023; 33:gr.277792.123. [PMID: 37945377 PMCID: PMC10760453 DOI: 10.1101/gr.277792.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 10/27/2023] [Indexed: 11/12/2023]
Abstract
Mammalian mRNA and lncRNA exons are often small compared to introns. The exon definition model predicts that exons splice autonomously, dependent on proximal exon sequence features, explaining their delineation within large introns. This model has not been examined on a genome-wide scale, however, leaving open the question of how often mRNA and lncRNA exons are autonomous. It is also unknown how frequently such exons can arise by chance. Here, we directly assayed large fragments (500-1000 bp) of the human genome by exon trapping, which detects exons spliced into a heterologous transgene, here designed with a large intron context. We define the trapped exons as "autonomous." We obtained ∼1.25 million trapped exons, including most known mRNA and well-annotated lncRNA internal exons, demonstrating that human exons are predominantly autonomous. mRNA exons are trapped with the highest efficiency. Nearly a million of the trapped exons are unannotated, most located in intergenic regions and antisense to mRNA, with depletion from the forward strand of introns. These exons are not conserved, suggesting they are nonfunctional and arose from random mutations. They are nonetheless highly enriched with known splicing promoting sequence features that delineate known exons. Novel autonomous exons are more numerous than annotated lncRNA exons, and computational models also indicate they will occur with similar frequency in any randomly generated sequence. These results show that most human coding exons splice autonomously, and provide an explanation for the existence of many unconserved lncRNAs, as well as a new annotation and inclusion levels of spliceable loci in the human genome.
Collapse
Affiliation(s)
- Nicholas Stepankiw
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1;
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
14
|
Shinder I, Hu R, Ji HJ, Chao KH, Pertea M. EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes. Nat Commun 2023; 14:7223. [PMID: 37940654 PMCID: PMC10632439 DOI: 10.1038/s41467-023-43017-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/30/2023] [Indexed: 11/10/2023] Open
Abstract
Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the 'phantom' introns resulting from these errors make their way into widely-used genome annotation databases. To address this issue, we present EASTR (Emending Alignments of Spliced Transcript Reads), a software tool that detects and removes falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, and Arabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR's application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts.
Collapse
Affiliation(s)
- Ida Shinder
- Cross Disciplinary Graduate Program in Biomedical Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
| | - Richard Hu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Hyun Joo Ji
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kuan-Hao Chao
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
15
|
Carrion SA, Michal JJ, Jiang Z. Alternative Transcripts Diversify Genome Function for Phenome Relevance to Health and Diseases. Genes (Basel) 2023; 14:2051. [PMID: 38002994 PMCID: PMC10671453 DOI: 10.3390/genes14112051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open
Abstract
Manipulation using alternative exon splicing (AES), alternative transcription start (ATS), and alternative polyadenylation (APA) sites are key to transcript diversity underlying health and disease. All three are pervasive in organisms, present in at least 50% of human protein-coding genes. In fact, ATS and APA site use has the highest impact on protein identity, with their ability to alter which first and last exons are utilized as well as impacting stability and translation efficiency. These RNA variants have been shown to be highly specific, both in tissue type and stage, with demonstrated importance to cell proliferation, differentiation and the transition from fetal to adult cells. While alternative exon splicing has a limited effect on protein identity, its ubiquity highlights the importance of these minor alterations, which can alter other features such as localization. The three processes are also highly interwoven, with overlapping, complementary, and competing factors, RNA polymerase II and its CTD (C-terminal domain) chief among them. Their role in development means dysregulation leads to a wide variety of disorders and cancers, with some forms of disease disproportionately affected by specific mechanisms (AES, ATS, or APA). Challenges associated with the genome-wide profiling of RNA variants and their potential solutions are also discussed in this review.
Collapse
Affiliation(s)
| | | | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA 99164-7620, USA; (S.A.C.); (J.J.M.)
| |
Collapse
|
16
|
Tao S, Hou Y, Diao L, Hu Y, Xu W, Xie S, Xiao Z. Long noncoding RNA study: Genome-wide approaches. Genes Dis 2023; 10:2491-2510. [PMID: 37554208 PMCID: PMC10404890 DOI: 10.1016/j.gendis.2022.10.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 10/09/2022] [Accepted: 10/23/2022] [Indexed: 11/30/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) have been confirmed to play a crucial role in various biological processes across several species. Though many efforts have been devoted to the expansion of the lncRNAs landscape, much about lncRNAs is still unknown due to their great complexity. The development of high-throughput technologies and the constantly improved bioinformatic methods have resulted in a rapid expansion of lncRNA research and relevant databases. In this review, we introduced genome-wide research of lncRNAs in three parts: (i) novel lncRNA identification by high-throughput sequencing and computational pipelines; (ii) functional characterization of lncRNAs by expression atlas profiling, genome-scale screening, and the research of cancer-related lncRNAs; (iii) mechanism research by large-scale experimental technologies and computational analysis. Besides, primary experimental methods and bioinformatic pipelines related to these three parts are summarized. This review aimed to provide a comprehensive and systemic overview of lncRNA genome-wide research strategies and indicate a genome-wide lncRNA research system.
Collapse
Affiliation(s)
- Shuang Tao
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Yarui Hou
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Liting Diao
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Yanxia Hu
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Wanyi Xu
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Shujuan Xie
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
- Institute of Vaccine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| | - Zhendong Xiao
- The Biotherapy Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong 510630, China
| |
Collapse
|
17
|
Varabyou A, Sommer MJ, Erdogdu B, Shinder I, Minkin I, Chao KH, Park S, Heinz J, Pockrandt C, Shumate A, Rincon N, Puiu D, Steinegger M, Salzberg SL, Pertea M. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol 2023; 24:249. [PMID: 37904256 PMCID: PMC10614308 DOI: 10.1186/s13059-023-03088-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
| | - Markus J Sommer
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Ida Shinder
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Cross Disciplinary Graduate Program in Biomedical Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Ilia Minkin
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Kuan-Hao Chao
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sukhwan Park
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Jakob Heinz
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Christopher Pockrandt
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Natalia Rincon
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Daniela Puiu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
18
|
Fair B, Najar CBA, Zhao J, Lozano S, Reilly A, Mossian G, Staley JP, Wang J, Li YI. Global impact of aberrant splicing on human gene expression levels. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.13.557588. [PMID: 37745605 PMCID: PMC10515962 DOI: 10.1101/2023.09.13.557588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Alternative splicing (AS) is pervasive in human genes, yet the specific function of most AS events remains unknown. It is widely assumed that the primary function of AS is to diversify the proteome, however AS can also influence gene expression levels by producing transcripts rapidly degraded by nonsense-mediated decay (NMD). Currently, there are no precise estimates for how often the coupling of AS and NMD (AS-NMD) impacts gene expression levels because rapidly degraded NMD transcripts are challenging to capture. To better understand the impact of AS on gene expression levels, we analyzed population-scale genomic data in lymphoblastoid cell lines across eight molecular assays that capture gene regulation before, during, and after transcription and cytoplasmic decay. Sequencing nascent mRNA transcripts revealed frequent aberrant splicing of human introns, which results in remarkably high levels of mRNA transcripts subject to NMD. We estimate that ~15% of all protein-coding transcripts are degraded by NMD, and this estimate increases to nearly half of all transcripts for lowly-expressed genes with many introns. Leveraging genetic variation across cell lines, we find that GWAS trait-associated loci explained by AS are similarly likely to associate with NMD-induced expression level differences as with differences in protein isoform usage. Additionally, we used the splice-switching drug risdiplam to perturb AS at hundreds of genes, finding that ~3/4 of the splicing perturbations induce NMD. Thus, we conclude that AS-NMD substantially impacts the expression levels of most human genes. Our work further suggests that much of the molecular impact of AS is mediated by changes in protein expression levels rather than diversification of the proteome.
Collapse
Affiliation(s)
- Benjamin Fair
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Carlos Buen Abad Najar
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Junxing Zhao
- Department of Medicinal Chemistry, University of Kansas, Lawrence, KS 66047, USA
| | - Stephanie Lozano
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
- Present address: Center for Neuroscience, University of California Davis, Davis, CA 95618, USA
| | - Austin Reilly
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Gabriela Mossian
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Jonathan P Staley
- Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, IL 60637, USA
| | - Jingxin Wang
- Department of Medicinal Chemistry, University of Kansas, Lawrence, KS 66047, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
19
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. Nature 2023; 622:41-47. [PMID: 37794265 PMCID: PMC10575709 DOI: 10.1038/s41586-023-06490-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 07/27/2023] [Indexed: 10/06/2023]
Abstract
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, Sao Paulo, Brazil
| | | | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Tempus Labs, Chicago, IL, USA
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Royston, UK
| | - Artemis G Hatzigeorgiou
- Department of Computer Science and Biomedical Informatics, Universithy of Thessaly, Lamia, Greece
- Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research, University of Bern, Bern, Switzerland
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Human Technopole, Milan, Italy.
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
20
|
Gabryelska MM, Conn SJ. The RNA interactome in the Hallmarks of Cancer. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1786. [PMID: 37042179 PMCID: PMC10909452 DOI: 10.1002/wrna.1786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/12/2023] [Accepted: 03/20/2023] [Indexed: 04/13/2023]
Abstract
Ribonucleic acid (RNA) molecules are indispensable for cellular homeostasis in healthy and malignant cells. However, the functions of RNA extend well beyond that of a protein-coding template. Rather, both coding and non-coding RNA molecules function through critical interactions with a plethora of cellular molecules, including other RNAs, DNA, and proteins. Deconvoluting this RNA interactome, including the interacting partners, the nature of the interaction, and dynamic changes of these interactions in malignancies has yielded fundamental advances in knowledge and are emerging as a novel therapeutic strategy in cancer. Here, we present an RNA-centric review of recent advances in the field of RNA-RNA, RNA-protein, and RNA-DNA interactomic network analysis and their impact across the Hallmarks of Cancer. This article is categorized under: RNA in Disease and Development > RNA in Disease RNA Interactions with Proteins and Other Molecules > RNA-Protein Complexes.
Collapse
Affiliation(s)
- Marta M Gabryelska
- Flinders Health and Medical Research Institute (FHMRI), College of Medicine and Public Health, Flinders University, Bedford Park, South Australia, Australia
| | - Simon J Conn
- Flinders Health and Medical Research Institute (FHMRI), College of Medicine and Public Health, Flinders University, Bedford Park, South Australia, Australia
| |
Collapse
|
21
|
Galván-Morales MÁ. Perspectives of Proteomics in Respiratory Allergic Diseases. Int J Mol Sci 2023; 24:12924. [PMID: 37629105 PMCID: PMC10454482 DOI: 10.3390/ijms241612924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/18/2023] [Accepted: 07/27/2023] [Indexed: 08/27/2023] Open
Abstract
Proteomics in respiratory allergic diseases has such a battery of techniques and programs that one would almost think there is nothing impossible to find, invent or mold. All the resources that we document here are involved in solving problems in allergic diseases, both diagnostic and prognostic treatment, and immunotherapy development. The main perspectives, according to this version, are in three strands and/or a lockout immunological system: (1) Blocking the diapedesis of the cells involved, (2) Modifications and blocking of paratopes and epitopes being understood by modifications to antibodies, antagonisms, or blocking them, and (3) Blocking FcεRI high-affinity receptors to prevent specific IgEs from sticking to mast cells and basophils. These tools and targets in the allergic landscape are, in our view, the prospects in the field. However, there are still many allergens to identify, including some homologies between allergens and cross-reactions, through the identification of structures and epitopes. The current vision of using proteomics for this purpose remains a constant; this is also true for the basis of diagnostic and controlled systems for immunotherapy. Ours is an open proposal to use this vision for treatment.
Collapse
Affiliation(s)
- Miguel Ángel Galván-Morales
- Departamento de Atención a la Salud, CBS. Unidad Xochimilco, Universidad Autónoma Metropolitana, Calzada del Hueso 1100, Villa Quietud, Coyoacán, Ciudad de México 04960, Mexico
| |
Collapse
|
22
|
Varabyou A, Erdogdu B, Salzberg SL, Pertea M. Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage. NATURE COMPUTATIONAL SCIENCE 2023; 3:700-708. [PMID: 38098813 PMCID: PMC10718564 DOI: 10.1038/s43588-023-00496-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/05/2023] [Indexed: 12/17/2023]
Abstract
ORFanage is a system designed to assign open reading frames (ORFs) to known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Steven L. Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
23
|
Simpson-Lavy K, Kupiec M. Glucose Inhibits Yeast AMPK (Snf1) by Three Independent Mechanisms. BIOLOGY 2023; 12:1007. [PMID: 37508436 PMCID: PMC10376661 DOI: 10.3390/biology12071007] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/13/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023]
Abstract
Snf1, the fungal homologue of mammalian AMP-dependent kinase (AMPK), is a key protein kinase coordinating the response of cells to a shortage of glucose. In fungi, the response is to activate respiratory gene expression and metabolism. The major regulation of Snf1 activity has been extensively investigated: In the absence of glucose, it becomes activated by phosphorylation of its threonine at position 210. This modification can be erased by phosphatases when glucose is restored. In the past decade, two additional independent mechanisms of Snf1 regulation have been elucidated. In response to glucose (or, surprisingly, also to DNA damage), Snf1 is SUMOylated by Mms21 at lysine 549. This inactivates Snf1 and leads to Snf1 degradation. More recently, glucose-induced proton export has been found to result in Snf1 inhibition via a polyhistidine tract (13 consecutive histidine residues) at the N-terminus of the Snf1 protein. Interestingly, the polyhistidine tract plays also a central role in the response to iron scarcity. This review will present some of the glucose-sensing mechanisms of S. cerevisiae, how they interact, and how their interplay results in Snf1 inhibition by three different, and independent, mechanisms.
Collapse
Affiliation(s)
- Kobi Simpson-Lavy
- The Shmunis School of Biomedicine & Cancer Research, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Martin Kupiec
- The Shmunis School of Biomedicine & Cancer Research, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| |
Collapse
|
24
|
Ivanov KI, Samuilova OV, Zamyatnin AA. The emerging roles of long noncoding RNAs in lymphatic vascular development and disease. Cell Mol Life Sci 2023; 80:197. [PMID: 37407839 PMCID: PMC10322780 DOI: 10.1007/s00018-023-04842-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 06/06/2023] [Accepted: 06/19/2023] [Indexed: 07/07/2023]
Abstract
Recent advances in RNA sequencing technologies helped uncover what was once uncharted territory in the human genome-the complex and versatile world of long noncoding RNAs (lncRNAs). Previously thought of as merely transcriptional "noise", lncRNAs have now emerged as essential regulators of gene expression networks controlling development, homeostasis and disease progression. The regulatory functions of lncRNAs are broad and diverse, and the underlying molecular mechanisms are highly variable, acting at the transcriptional, post-transcriptional, translational, and post-translational levels. In recent years, evidence has accumulated to support the important role of lncRNAs in the development and functioning of the lymphatic vasculature and associated pathological processes such as tumor-induced lymphangiogenesis and cancer metastasis. In this review, we summarize the current knowledge on the role of lncRNAs in regulating the key genes and pathways involved in lymphatic vascular development and disease. Furthermore, we discuss the potential of lncRNAs as novel therapeutic targets and outline possible strategies for the development of lncRNA-based therapeutics to treat diseases of the lymphatic system.
Collapse
Affiliation(s)
- Konstantin I Ivanov
- Research Center for Translational Medicine, Sirius University of Science and Technology, Sochi, Russian Federation.
- Department of Microbiology, University of Helsinki, Helsinki, Finland.
| | - Olga V Samuilova
- Department of Biochemistry, Sechenov First Moscow State Medical University, Moscow, Russian Federation
- HSE University, Moscow, Russian Federation
| | - Andrey A Zamyatnin
- Research Center for Translational Medicine, Sirius University of Science and Technology, Sochi, Russian Federation
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russian Federation
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russian Federation
- Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| |
Collapse
|
25
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
26
|
Zhang Q, Shao M. Transcript Assembly and Annotations: Bias and Adjustment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.20.537700. [PMID: 37131680 PMCID: PMC10153229 DOI: 10.1101/2023.04.20.537700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Motivation Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. Results We investigate the impact of annotations on transcript assembly. We observe that conflicting conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
Collapse
Affiliation(s)
- Qimin Zhang
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
| | - Mingfu Shao
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
- Huck Institutes of the Life Sciences, The Pennsylvania State University
| |
Collapse
|
27
|
Hamann MV, Adiba M, Lange UC. Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets. BMC Med Genomics 2023; 16:68. [PMID: 37013607 PMCID: PMC10068191 DOI: 10.1186/s12920-023-01486-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 03/11/2023] [Indexed: 04/05/2023] Open
Abstract
BACKGROUND Human endogenous retroviruses (HERV) are repetitive sequence elements and a substantial part of the human genome. Their role in development has been well documented and there is now mounting evidence that dysregulated HERV expression also contributes to various human diseases. While research on HERV elements has in the past been hampered by their high sequence similarity, advanced sequencing technology and analytical tools have empowered the field. For the first time, we are now able to undertake locus-specific HERV analysis, deciphering expression patterns, regulatory networks and biological functions of these elements. To do so, we inevitable rely on omics datasets available through the public domain. However, technical parameters inevitably differ, making inter-study analysis challenging. We here address the issue of confounding factors for profiling locus-specific HERV transcriptomes using datasets from multiple sources. METHODS We collected RNAseq datasets of CD4 and CD8 primary T cells and extracted HERV expression profiles for 3220 elements, resembling most intact, near full-length proviruses. Looking at sequencing parameters and batch effects, we compared HERV signatures across datasets and determined permissive features for HERV expression analysis from multiple-source data. RESULTS We could demonstrate that considering sequencing parameters, sequencing-depth is most influential on HERV signature outcome. Sequencing samples deeper broadens the spectrum of expressed HERV elements. Sequencing mode and read length are secondary parameters. Nevertheless, we find that HERV signatures from smaller RNAseq datasets do reliably reveal most abundantly expressed HERV elements. Overall, HERV signatures between samples and studies overlap substantially, indicating a robust HERV transcript signature in CD4 and CD8 T cells. Moreover, we find that measures of batch effect reduction are critical to uncover genic and HERV expression differences between cell types. After doing so, differences in the HERV transcriptome between ontologically closely related CD4 and CD8 T cells became apparent. CONCLUSION In our systematic approach to determine sequencing and analysis parameters for detection of locus-specific HERV expression, we provide evidence that analysis of RNAseq datasets from multiple studies can aid confidence of biological findings. When generating de novo HERV expression datasets we recommend increased sequence depth ( > = 100 mio reads) compared to standard genic transcriptome pipelines. Finally, batch effect reduction measures need to be implemented to allow for differential expression analysis.
Collapse
Affiliation(s)
| | - Maisha Adiba
- Leibniz Institute of Virology (LIV), Hamburg, Germany
| | - Ulrike C Lange
- Leibniz Institute of Virology (LIV), Hamburg, Germany.
- Institute for Infection Research and Vaccine Development, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| |
Collapse
|
28
|
Moreno J, Zoghebi K, Salehi D, Kim L, Shoushtari SK, Tiwari RK, Parang K. Amphiphilic Cell-Penetrating Peptides Containing Arginine and Hydrophobic Residues as Protein Delivery Agents. Pharmaceuticals (Basel) 2023; 16:ph16030469. [PMID: 36986567 PMCID: PMC10053436 DOI: 10.3390/ph16030469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 03/18/2023] [Accepted: 03/21/2023] [Indexed: 03/30/2023] Open
Abstract
The entry of proteins through the cell membrane is challenging, thus limiting their use as potential therapeutics. Seven cell-penetrating peptides, designed in our laboratory, were evaluated for the delivery of proteins. Fmoc solid-phase peptide synthesis was utilized for the synthesis of seven cyclic or hybrid cyclic-linear amphiphilic peptides composed of hydrophobic (tryptophan (W) or 3,3-diphenylalanine (Dip) and positively-charged arginine (R) residues, such as [WR]4, [WR]9, [WWRR]4, [WWRR]5, [(RW)5K](RW)5, [R5K]W7, and [DipR]5. Confocal microscopy was used to screen the peptides as a protein delivery system of model cargo proteins, green and red fluorescein proteins (GFP and RFP). Based on the confocal microscopy results, [WR]9 and [DipR]5 were found to be more efficient among all the peptides and were selected for further studies. [WR]9 (1-10 µM) + protein (GFP and RFP) physical mixture did not show high cytotoxicity (>90% viability) in triple-negative breast cancer cells (MDA-MB-231) after 24 h, while [DipR]5 (1-10 µM) physical mixture with GFP exhibited more than 81% cell viability. Confocal microscopy images revealed internalization of GFP and RFP in MDA-MB-231 cells using [WR]9 (2-10 μM) and [DipR]5 (1-10 µM). Fluorescence-activated cell sorting (FACS) analysis indicated that the cellular uptake of GFP was concentration-dependent in the presence of [WR]9 in MDA-MB-231 cells after 3 h of incubation at 37 °C. The concentration-dependent uptake of GFP and RFP was also observed in the presence of [DipR5] in SK-OV-3 and MDA-MB-231 cells after 3 h of incubation at 37 °C. FACS analysis indicated that the cellular uptake of GFP in the presence of [WR]9 was partially decreased by methyl-β-cyclodextrin and nystatin as endocytosis inhibitors after 3 h of incubation in MDA-MB-231 cells, whereas nystatin and chlorpromazine as endocytosis inhibitors slightly reduced the uptake of GFP in the presence of [DipR]5 after 3 h of incubation in MDA-MB-231. [WR]9 was able to deliver therapeutically relevant proteins (Histone H2A) at different concentrations. These results provide insight into the use of amphiphilic cyclic peptides in the delivery of protein-related therapeutics.
Collapse
Affiliation(s)
- Jonathan Moreno
- Center for Targeted Drug Delivery, Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Harry and Diane Rinker Health Science Campus, Irvine, CA 92618, USA
| | - Khalid Zoghebi
- Center for Targeted Drug Delivery, Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Harry and Diane Rinker Health Science Campus, Irvine, CA 92618, USA
- Department of Pharmaceutical Chemistry, College of Pharmacy, Jazan University, Jazan 82826, Saudi Arabia
| | - David Salehi
- Center for Targeted Drug Delivery, Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Harry and Diane Rinker Health Science Campus, Irvine, CA 92618, USA
| | - Lois Kim
- Center for Targeted Drug Delivery, Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Harry and Diane Rinker Health Science Campus, Irvine, CA 92618, USA
| | - Sorour Khayyatnejad Shoushtari
- Center for Targeted Drug Delivery, Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Harry and Diane Rinker Health Science Campus, Irvine, CA 92618, USA
| | - Rakesh K Tiwari
- Center for Targeted Drug Delivery, Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Harry and Diane Rinker Health Science Campus, Irvine, CA 92618, USA
| | - Keykavous Parang
- Center for Targeted Drug Delivery, Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Harry and Diane Rinker Health Science Campus, Irvine, CA 92618, USA
| |
Collapse
|
29
|
Linker S, Schellhaas C, Kamenik AS, Veldhuizen MM, Waibl F, Roth HJ, Fouché M, Rodde S, Riniker S. Lessons for Oral Bioavailability: How Conformationally Flexible Cyclic Peptides Enter and Cross Lipid Membranes. J Med Chem 2023; 66:2773-2788. [PMID: 36762908 PMCID: PMC9969412 DOI: 10.1021/acs.jmedchem.2c01837] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Indexed: 02/11/2023]
Abstract
Cyclic peptides extend the druggable target space due to their size, flexibility, and hydrogen-bonding capacity. However, these properties impact also their passive membrane permeability. As the "journey" through membranes cannot be monitored experimentally, little is known about the underlying process, which hinders rational design. Here, we use molecular simulations to uncover how cyclic peptides permeate a membrane. We show that side chains can act as "molecular anchors", establishing the first contact with the membrane and enabling insertion. Once inside, the peptides are positioned between headgroups and lipid tails─a unique polar/apolar interface. Only one of two distinct orientations at this interface allows for the formation of the permeable "closed" conformation. In the closed conformation, the peptide crosses to the lower leaflet via another "anchoring" and flipping mechanism. Our findings provide atomistic insights into the permeation process of flexible cyclic peptides and reveal design considerations for each step of the process.
Collapse
Affiliation(s)
- Stephanie
M. Linker
- Department
of Chemistry and Applied Biosciences, ETH
Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Christian Schellhaas
- Department
of Chemistry and Applied Biosciences, ETH
Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Anna S. Kamenik
- Department
of Chemistry and Applied Biosciences, ETH
Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Mac M. Veldhuizen
- Department
of Chemistry and Applied Biosciences, ETH
Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Franz Waibl
- Department
of Chemistry and Applied Biosciences, ETH
Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Hans-Jörg Roth
- Novartis
Institutes for BioMedical Research, Novartis
Pharma AG, Novartis Campus, 4056 Basel, Switzerland
| | - Marianne Fouché
- Novartis
Institutes for BioMedical Research, Novartis
Pharma AG, Novartis Campus, 4056 Basel, Switzerland
| | - Stephane Rodde
- Novartis
Institutes for BioMedical Research, Novartis
Pharma AG, Novartis Campus, 4056 Basel, Switzerland
| | - Sereina Riniker
- Department
of Chemistry and Applied Biosciences, ETH
Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
30
|
Tjaden B. Escherichia coli transcriptome assembly from a compendium of RNA-seq data sets. RNA Biol 2023; 20:77-84. [PMID: 36920168 PMCID: PMC10392735 DOI: 10.1080/15476286.2023.2189331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 01/09/2023] [Accepted: 01/27/2023] [Indexed: 03/16/2023] Open
Abstract
Owing to the complexities of bacterial RNA biology, the transcriptomes of even the best studied bacteria are not fully understood. To help elucidate the transcriptional landscape of E. coli, we compiled a compendium of 3,376 RNA-seq data sets composed of more than 7 trillion sequenced bases, which we evaluate with a transcript assembly pipeline. We report expression profiles for all annotated E. coli genes as well as 5,071 other transcripts. Additionally, we observe hundreds of instances of co-transcribed genes that are novel with respect to existing operon databases. By integrating data from a large number of sequencing experiments corresponding to a wide range of conditions, we are able to obtain a comprehensive view of the E. coli transcriptome.
Collapse
Affiliation(s)
- Brian Tjaden
- Department of Computer Science, Wellesley College, Wellesley, MA, USA
| |
Collapse
|
31
|
Sun Z, Jing C, Zhan H, Guo X, Suo N, Kong F, Tao W, Xiao C, Hu D, Wang H, Jiang S. Identification of tumor antigens and immune landscapes for bladder urothelial carcinoma mRNA vaccine. Front Immunol 2023; 14:1097472. [PMID: 36761744 PMCID: PMC9905425 DOI: 10.3389/fimmu.2023.1097472] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 01/09/2023] [Indexed: 01/26/2023] Open
Abstract
Background Bladder urothelial carcinoma (BLCA) is associated with high mortality and recurrence. Although mRNA-based vaccines are promising treatment strategies for combating multiple solid cancers, their efficacy against BLCA remains unclear. We aimed to identify potential effective antigens of BLCA for the development of mRNA-based vaccines and screen for immune clusters to select appropriate candidates for vaccination. Methods Gene expression microarray data and clinical information were retrieved from The Cancer Genome Atlas and GSE32894, respectively. The mRNA splicing patterns were obtained from the SpliceSeq portal. The cBioPortal for Cancer Genomics was used to visualize genetic alteration profiles. Furthermore, nonsense-mediated mRNA decay (NMD) analysis, correlation analysis, consensus clustering analysis, immune cell infiltration analysis, and weighted co-expression network analysis were conducted. Results Six upregulated and mutated tumor antigens related to NMD, and infiltration of APCs were identified in patients with BLCA, including HP1BP3, OSBPL9, SSH3, ZCCHC8, FANCI, and EIF4A2. The patients were subdivided into two immune clusters (IC1 and IC2) with distinct clinical, cellular and molecular features. Patients in IC1 represented immunologically 'hot' phenotypes, whereas those in IC2 represented immunologically 'cold' phenotypes. Moreover, the survival rate was better in IC2 than in IC1, and the immune landscape of BLCA indicated significant inter-patient heterogeneity. Finally, CALD1, TGFB3, and ANXA6 were identified as key genes of BLCA through WGCNA analysis, and their mRNA expression levels were measured using qRT-PCR. Conclusion HP1BP3, OSBPL9, SSH3, ZCCHC8, FANCI, and EIF4A2 were identified as potential antigens for developing mRNA-based vaccines against BLCA, and patients in IC2 might benefit more from vaccination.
Collapse
Affiliation(s)
- Zhuolun Sun
- Department of Urology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Changying Jing
- Faculty of Medicine, Ludwig Maximilian University of Munich (LMU), Munich, Germany.,Institute of Diabetes and Regeneration, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Hailun Zhan
- Department of Urology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Xudong Guo
- Department of Urology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Ning Suo
- Department of Urology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Feng Kong
- Department of Urology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Wen Tao
- Department of Urology, First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Chutian Xiao
- Department of Urology, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Daoyuan Hu
- Department of Urology, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Hanbo Wang
- Department of Urology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Shaobo Jiang
- Department of Urology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| |
Collapse
|
32
|
Pauza AG, Murphy D, Paton JFR. Transcriptomics of the Carotid Body. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1427:1-11. [PMID: 37322330 DOI: 10.1007/978-3-031-32371-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The carotid body (CB) has emerged as a potential therapeutic target for treating sympathetically mediated cardiovascular, respiratory, and metabolic diseases. In adjunct to its classical role as an arterial O2 sensor, the CB is a multimodal sensor activated by a range of stimuli in the circulation. However, consensus on how CB multimodality is achieved is lacking; even the best studied O2-sensing appears to involve multiple convergent mechanisms. A strategy to understand multimodal sensing is to adopt a hypothesis-free, high-throughput transcriptomic approach. This has proven instrumental for understanding fundamental mechanisms of CB response to hypoxia and other stimulants, its developmental niche, cellular heterogeneity, laterality, and pathophysiological remodeling in disease states. Herein, we review this published work that reveals novel molecular mechanisms underpinning multimodal sensing and reveals numerous gaps in knowledge that require experimental testing.
Collapse
Affiliation(s)
- Audrys G Pauza
- Manaaki Manawa - The Centre for Heart Research, Department of Physiology, Faculty of Medical & Health Sciences, University of Auckland, Auckland, New Zealand.
| | - David Murphy
- Molecular Neuroendocrinology Research Group, Bristol Medical School, Translational Health Sciences, University of Bristol, Bristol, UK
| | - Julian F R Paton
- Manaaki Manawa - The Centre for Heart Research, Department of Physiology, Faculty of Medical & Health Sciences, University of Auckland, Auckland, New Zealand
| |
Collapse
|
33
|
Sommer MJ, Cha S, Varabyou A, Rincon N, Park S, Minkin I, Pertea M, Steinegger M, Salzberg SL. Structure-guided isoform identification for the human transcriptome. eLife 2022; 11:e82556. [PMID: 36519529 PMCID: PMC9812405 DOI: 10.7554/elife.82556] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Recently developed methods to predict three-dimensional protein structure with high accuracy have opened new avenues for genome and proteome research. We explore a new hypothesis in genome annotation, namely whether computationally predicted structures can help to identify which of multiple possible gene isoforms represents a functional protein product. Guided by protein structure predictions, we evaluated over 230,000 isoforms of human protein-coding genes assembled from over 10,000 RNA sequencing experiments across many human tissues. From this set of assembled transcripts, we identified hundreds of isoforms with more confidently predicted structure and potentially superior function in comparison to canonical isoforms in the latest human gene database. We illustrate our new method with examples where structure provides a guide to function in combination with expression and evolutionary evidence. Additionally, we provide the complete set of structures as a resource to better understand the function of human genes and their isoforms. These results demonstrate the promise of protein structure prediction as a genome annotation tool, allowing us to refine even the most highly curated catalog of human proteins. More generally we demonstrate a practical, structure-guided approach that can be used to enhance the annotation of any genome.
Collapse
Affiliation(s)
- Markus J Sommer
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
| | - Sooyoung Cha
- School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea
- Artificial Intelligence Institute, Seoul National UniversitySeoulRepublic of Korea
| | - Ales Varabyou
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
| | - Natalia Rincon
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
| | - Sukhwan Park
- School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea
- Artificial Intelligence Institute, Seoul National UniversitySeoulRepublic of Korea
| | - Ilia Minkin
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
| | - Mihaela Pertea
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
| | - Martin Steinegger
- School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea
- Artificial Intelligence Institute, Seoul National UniversitySeoulRepublic of Korea
- Institute of Molecular Biology and Genetics, Seoul National UniversitySeoulRepublic of Korea
| | - Steven L Salzberg
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
- Department of Biostatistics, Johns Hopkins UniversityBaltimoreUnited States
| |
Collapse
|
34
|
Byrne JA, Park Y, Richardson RAK, Pathmendra P, Sun M, Stoeger T. Protection of the human gene research literature from contract cheating organizations known as research paper mills. Nucleic Acids Res 2022; 50:12058-12070. [PMID: 36477580 PMCID: PMC9757046 DOI: 10.1093/nar/gkac1139] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Human gene research generates new biology insights with translational potential, yet few studies have considered the health of the human gene literature. The accessibility of human genes for targeted research, combined with unreasonable publication pressures and recent developments in scholarly publishing, may have created a market for low-quality or fraudulent human gene research articles, including articles produced by contract cheating organizations known as paper mills. This review summarises the evidence that paper mills contribute to the human gene research literature at scale and outlines why targeted gene research may be particularly vulnerable to systematic research fraud. To raise awareness of targeted gene research from paper mills, we highlight features of problematic manuscripts and publications that can be detected by gene researchers and/or journal staff. As improved awareness and detection could drive the further evolution of paper mill-supported publications, we also propose changes to academic publishing to more effectively deter and correct problematic publications at scale. In summary, the threat of paper mill-supported gene research highlights the need for all researchers to approach the literature with a more critical mindset, and demand publications that are underpinned by plausible research justifications, rigorous experiments and fully transparent reporting.
Collapse
Affiliation(s)
- Jennifer A Byrne
- To whom correspondence should be addressed. Tel: +61 2 4920 4135;
| | - Yasunori Park
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Reese A K Richardson
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Pranujan Pathmendra
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Mengyi Sun
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Thomas Stoeger
- To whom correspondence should be addressed. Tel: +61 2 4920 4135;
| |
Collapse
|
35
|
Crine SL, Acharya KR. Molecular basis of C-mannosylation - a structural perspective. FEBS J 2022; 289:7670-7687. [PMID: 34741587 DOI: 10.1111/febs.16265] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Revised: 10/22/2021] [Accepted: 11/04/2021] [Indexed: 01/14/2023]
Abstract
The structural and functional diversity of proteins can be enhanced by numerous post-translational modifications. C-mannosylation is a rare form of glycosylation consisting of a single alpha or beta D-mannopyranose forming a carbon-carbon bond with the pyrrole ring of a tryptophan residue. Despite first being discovered in 1994, C-mannosylation is still poorly understood and 3D structures are available for only a fraction of the total predicted C-mannosylated proteins. Here, we present the first comprehensive review of C-mannosylated protein structures by analysing the data for all 10 proteins with C-mannosylation/s deposited in the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB). We analysed in detail the WXXW/WXXWXXW consensus motif and the highly conserved pair of arginine residues in thrombospondin type 1 repeat C-mannosylation sites or homologous arginine residues in other domains. Furthermore, we identified a conserved PXP sequence C-terminal of the C-mannosylation site. The PXP motif forms a tight turn region in the polypeptide chain and its universal conservation in C-mannosylated protein is worthy of further experimental study. The stabilization of C-mannopyranosyl groups was demonstrated through hydrogen bonding with arginine and other charged or polar amino acids. Where possible, the structural findings were linked to other functional studies demonstrating the role of C-mannosylation in protein stability, secretion or function. With the current technological advances in structural biology, we hope to see more progress in the study of C-mannosylation that may correspond to discoveries of novel C-mannosylation pathways and functions with implications for human health and biotechnology.
Collapse
Affiliation(s)
- Samuel L Crine
- Department of Biology and Biochemistry, University of Bath, UK
| | - K Ravi Acharya
- Department of Biology and Biochemistry, University of Bath, UK
| |
Collapse
|
36
|
Premzl M. Revised eutherian gene collections. BMC Genom Data 2022; 23:56. [PMID: 35870891 PMCID: PMC9308196 DOI: 10.1186/s12863-022-01071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/13/2022] [Indexed: 11/24/2022] Open
Abstract
Objectives The most recent research projects in scientific field of eutherian comparative genomics included intentions to sequence every extant eutherian species genome in foreseeable future, so that future revisions and updates of eutherian gene data sets were expected. Data description Using 35 public eutherian reference genomic sequence assemblies and free available software, the eutherian comparative genomic analysis protocol RRID:SCR_014401 was published as guidance against potential genomic sequence errors. The protocol curated 14 eutherian third-party data gene data sets, including, in aggregate, 2615 complete coding sequences that were deposited in European Nucleotide Archive. The published eutherian gene collections were used in revisions and updates of eutherian gene data set classifications and nomenclatures that included gene annotations, phylogenetic analyses and protein molecular evolution analyses.
Collapse
|
37
|
Zhang H, Wafula EK, Eilers J, Harkess A, Ralph PE, Timilsena PR, dePamphilis CW, Waite JM, Honaas LA. Building a foundation for gene family analysis in Rosaceae genomes with a novel workflow: A case study in Pyrus architecture genes. FRONTIERS IN PLANT SCIENCE 2022; 13:975942. [PMID: 36452099 PMCID: PMC9702816 DOI: 10.3389/fpls.2022.975942] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/21/2022] [Indexed: 05/26/2023]
Abstract
The rapid development of sequencing technologies has led to a deeper understanding of plant genomes. However, direct experimental evidence connecting genes to important agronomic traits is still lacking in most non-model plants. For instance, the genetic mechanisms underlying plant architecture are poorly understood in pome fruit trees, creating a major hurdle in developing new cultivars with desirable architecture, such as dwarfing rootstocks in European pear (Pyrus communis). An efficient way to identify genetic factors for important traits in non-model organisms can be to transfer knowledge across genomes. However, major obstacles exist, including complex evolutionary histories and variable quality and content of publicly available plant genomes. As researchers aim to link genes to traits of interest, these challenges can impede the transfer of experimental evidence across plant species, namely in the curation of high-quality, high-confidence gene models in an evolutionary context. Here we present a workflow using a collection of bioinformatic tools for the curation of deeply conserved gene families of interest across plant genomes. To study gene families involved in tree architecture in European pear and other rosaceous species, we used our workflow, plus a draft genome assembly and high-quality annotation of a second P. communis cultivar, 'd'Anjou.' Our comparative gene family approach revealed significant issues with the most recent 'Bartlett' genome - primarily thousands of missing genes due to methodological bias. After correcting assembly errors on a global scale in the 'Bartlett' genome, we used our workflow for targeted improvement of our genes of interest in both P. communis genomes, thus laying the groundwork for future functional studies in pear tree architecture. Further, our global gene family classification of 15 genomes across 6 genera provides a valuable and previously unavailable resource for the Rosaceae research community. With it, orthologs and other gene family members can be easily identified across any of the classified genomes. Importantly, our workflow can be easily adopted for any other plant genomes and gene families of interest.
Collapse
Affiliation(s)
- Huiting Zhang
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - Eric K. Wafula
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Jon Eilers
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| | - Alex E. Harkess
- College of Agriculture, Auburn University, Auburn, AL, United States
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States
| | - Paula E. Ralph
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Prakash Raj Timilsena
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Claude W. dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Jessica M. Waite
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| | - Loren A. Honaas
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| |
Collapse
|
38
|
Hu W, Wu Y, Shi Q, Wu J, Kong D, Wu X, He X, Liu T, Li S. Systematic characterization of cancer transcriptome at transcript resolution. Nat Commun 2022; 13:6803. [PMID: 36357395 PMCID: PMC9649690 DOI: 10.1038/s41467-022-34568-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 10/31/2022] [Indexed: 11/11/2022] Open
Abstract
Transcribed RNAs undergo various regulation and modification to become functional transcripts. Notably, cancer transcriptome has not been fully characterized at transcript resolution. Herein, we carry out a reference-based transcript assembly across >1000 cancer cell lines. We identify 498,255 transcripts, approximately half of which are unannotated. Unannotated transcripts are closely associated with cancer-related hallmarks and show clinical significance. We build a high-confidence RNA binding protein (RBP)-transcript regulatory network, wherein most RBPs tend to regulate transcripts involved in cell proliferation. We identify numerous transcripts that are highly associated with anti-cancer drug sensitivity. Furthermore, we establish RBP-transcript-drug axes, wherein PTBP1 is experimentally validated to affect the sensitivity to decitabine by regulating KIAA1522-a6 transcript. Finally, we establish a user-friendly data portal to serve as a valuable resource for understanding cancer transcriptome diversity and its potential clinical utility at transcript level. Our study substantially extends cancer RNA repository and will facilitate anti-cancer drug discovery.
Collapse
Affiliation(s)
- Wei Hu
- grid.16821.3c0000 0004 0368 8293Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620 China
| | - Yangjun Wu
- grid.452404.30000 0004 1808 0942Department of Gynecological Oncology, Fudan University Shanghai Cancer Center, Shanghai, 200032 China
| | - Qili Shi
- grid.11841.3d0000 0004 0619 8943Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai, 200032 China
| | - Jingni Wu
- grid.16821.3c0000 0004 0368 8293Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620 China
| | - Deping Kong
- grid.16821.3c0000 0004 0368 8293Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620 China
| | - Xiaohua Wu
- grid.452404.30000 0004 1808 0942Department of Gynecological Oncology, Fudan University Shanghai Cancer Center, Shanghai, 200032 China
| | - Xianghuo He
- grid.11841.3d0000 0004 0619 8943Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai, 200032 China
| | - Teng Liu
- grid.16821.3c0000 0004 0368 8293Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620 China ,grid.440657.40000 0004 1762 5832Institute of Big Data and Artificial Intelligence in Medicine, School of Electronics and Information Engineering, Taizhou University, Taizhou, 318000 China
| | - Shengli Li
- grid.16821.3c0000 0004 0368 8293Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620 China
| |
Collapse
|
39
|
Li Z, Liu L, Feng C, Qin Y, Xiao J, Zhang Z, Ma L. LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res 2022; 51:D186-D191. [PMID: 36330950 PMCID: PMC9825513 DOI: 10.1093/nar/gkac999] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/11/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022] Open
Abstract
LncBook, a comprehensive resource of human long non-coding RNAs (lncRNAs), has been used in a wide range of lncRNA studies across various biological contexts. Here, we present LncBook 2.0 (https://ngdc.cncb.ac.cn/lncbook), with significant updates and enhancements as follows: (i) incorporation of 119 722 new transcripts, 9632 new genes, and gene structure update of 21 305 lncRNAs; (ii) characterization of conservation features of human lncRNA genes across 40 vertebrates; (iii) integration of lncRNA-encoded small proteins; (iv) enrichment of expression and DNA methylation profiles with more biological contexts and (v) identification of lncRNA-protein interactions and improved prediction of lncRNA-miRNA interactions. Collectively, LncBook 2.0 accommodates a high-quality collection of 95 243 lncRNA genes and 323 950 transcripts and incorporates their abundant annotations at different omics levels, thereby enabling users to decipher functional significance of lncRNAs in different biological contexts.
Collapse
Affiliation(s)
| | | | | | - Yuxin Qin
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingfa Xiao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- Correspondence may also be addressed to Zhang Zhang. Tel: +86 10 8409 7261; Fax: +86 10 8409 7298;
| | - Lina Ma
- To whom correspondence should be addressed. Tel: +86 10 8409 7845; Fax: +86 10 8409 7298;
| |
Collapse
|
40
|
Peel E, Silver L, Brandies P, Zhu Y, Cheng Y, Hogg CJ, Belov K. Best genome sequencing strategies for annotation of complex immune gene families in wildlife. Gigascience 2022; 11:6780307. [PMID: 36310247 PMCID: PMC9618407 DOI: 10.1093/gigascience/giac100] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 08/10/2022] [Accepted: 09/29/2022] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND The biodiversity crisis and increasing impact of wildlife disease on animal and human health provides impetus for studying immune genes in wildlife. Despite the recent boom in genomes for wildlife species, immune genes are poorly annotated in nonmodel species owing to their high level of polymorphism and complex genomic organisation. Our research over the past decade and a half on Tasmanian devils and koalas highlights the importance of genomics and accurate immune annotations to investigate disease in wildlife. Given this, we have increasingly been asked the minimum levels of genome quality required to effectively annotate immune genes in order to study immunogenetic diversity. Here we set out to answer this question by manually annotating immune genes in 5 marsupial genomes and 1 monotreme genome to determine the impact of sequencing data type, assembly quality, and automated annotation on accurate immune annotation. RESULTS Genome quality is directly linked to our ability to annotate complex immune gene families, with long reads and scaffolding technologies required to reassemble immune gene clusters and elucidate evolution, organisation, and true gene content of the immune repertoire. Draft-quality genomes generated from short reads with HiC or 10× Chromium linked reads were unable to achieve this. Despite mammalian BUSCOv5 scores of up to 94.1% amongst the 6 genomes, automated annotation pipelines incorrectly annotated up to 59% of manually annotated immune genes regardless of assembly quality or method of automated annotation. CONCLUSIONS Our results demonstrate that long reads and scaffolding technologies, alongside manual annotation, are required to accurately study the immune gene repertoire of wildlife species.
Collapse
Affiliation(s)
- Emma Peel
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia,Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| | - Luke Silver
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Parice Brandies
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Ying Zhu
- Sichuan Provincial Academy of Natural Resource Sciences, Chengdu, Sichuan 610000, China
| | - Yuanyuan Cheng
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia,Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| | - Katherine Belov
- Correspondence address. Katherine Belov, School of Life and Environmental Sciences, Rm 206, RMC Gunn Building (B19), The University of Sydney, Sydney, NSW 2006, Australia. E-mail:
| |
Collapse
|
41
|
Regulation of yeast Snf1 (AMPK) by a polyhistidine containing pH sensing module. iScience 2022; 25:105083. [PMID: 36147951 PMCID: PMC9486060 DOI: 10.1016/j.isci.2022.105083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 08/12/2022] [Accepted: 09/01/2022] [Indexed: 11/23/2022] Open
Abstract
Cellular regulation of pH is crucial for internal biological processes and for the import and export of ions and nutrients. In the yeast Saccharomyces cerevisiae, the major proton pump (Pma1) is regulated by glucose. Glucose is also an inhibitor of the energy sensor Snf1/AMPK, which is conserved in all eukaryotes. Here, we demonstrate that a poly-histidine (polyHIS) tract in the pre-kinase region (PKR) of Snf1 functions as a pH-sensing module (PSM) and regulates Snf1 activity. This regulation is independent from, and unaffected by, phosphorylation at T210, the major regulatory control of Snf1, but is controlled by the Pma1 plasma-membrane proton pump. By examining the PKR from additional yeast species, and by varying the number of histidines in the PKR, we determined that the polyHIS functions progressively. This regulation mechanism links the activity of a key enzyme with the metabolic status of the cell at any given moment.
Collapse
|
42
|
Tung KF, Lin WC. TEx-MST: tissue expression profiles of MANE select transcripts. Database (Oxford) 2022; 2022:6726258. [PMID: 36170113 PMCID: PMC9518666 DOI: 10.1093/database/baac089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/16/2022] [Accepted: 09/23/2022] [Indexed: 12/05/2022]
Abstract
Recently, a new reference transcript dataset [Matched Annotation from the NCBI and EMBL-EBI (MANE) select] was released by NCBI and EMBL-EBI to make available a new unified representative transcript for human protein-coding genes. While the main purpose of MANE project is to provide a harmonized gene and transcript information standard, there is no explicit tissue expression information about these MANE select transcripts. In this report, we tried to provide useful expression profiles of MANE select transcripts in various normal human tissues to allow further interrogation of their molecular modulations and functional significance. We obtained the new V9 transcript expression dataset from the Genotype-Tissue Expression (GTEx) web portal. This new GTEx dataset, based on a long-read sequencing platform, affords better assessment of the expression of alternative spliced transcripts. This tissue expression profiles of MANE select transcripts (TEx-MST) database not only provides the basic information of MANE select transcripts but also tissue expression profiles on alternative transcripts in protein-coding genes. Users can initiate the interrogation by gene symbol searches or by browsing the MANE genes with various criteria (such as genome locations or expression rankings). We further utilized the GENCODE biotype feature to identify the top-ranked protein-coding transcripts by choosing the most expressed protein-coding transcripts from GTEx datasets (both V8 and V9 datasets). In summary, there are 18 083 genes matched between MANE and GTEx. Among them, 13 245 MANE select transcripts matched with the top-ranked protein-coding transcripts in GTEx V9 dataset, which underlined the dominate expression of MANE select transcripts. This TEx-MST web bioinformatic database provides a visualized user interface for the normal tissue expression patterns of MANE select transcripts using the newly released GTEx dataset. Database URL: TEx-MST is available at https://texmst.ibms.sinica.edu.tw/
Collapse
Affiliation(s)
- Kuo-Feng Tung
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
| | - Wen-chang Lin
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
- Institute of Biomedical Informatics, National Yang-Ming Chiao Tung University , Taipei 112, Taiwan, R.O.C
| |
Collapse
|
43
|
Mylarshchikov DE, Mironov AA. ortho2align: a sensitive approach for searching for orthologues of novel lncRNAs. BMC Bioinformatics 2022; 23:384. [PMID: 36123626 PMCID: PMC9487038 DOI: 10.1186/s12859-022-04929-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 09/13/2022] [Indexed: 11/12/2022] Open
Abstract
Background Many novel long noncoding RNAs have been discovered in recent years due to advances in high-throughput sequencing experiments. Finding orthologues of these novel lncRNAs might facilitate clarification of their functional role in living organisms. However, lncRNAs exhibit low sequence conservation, so specific methods for enhancing the signal-to-noise ratio were developed. Nevertheless, current methods such as transcriptomes comparison approaches or searches for conserved secondary structures are not applicable to novel, previously unannotated lncRNAs by design. Results We present ortho2align—a versatile sensitive synteny-based lncRNA orthologue search tool with statistical assessment of sequence conservation. This tool allows control of the specificity of the search process and optional annotation of found orthologues. ortho2align shows similar performance in terms of sensitivity and resource usage as the state-of-the-art method for aligning orthologous lncRNAs but also enables scientists to predict unannotated orthologous sequences for lncRNAs in question. Using ortho2align, we predicted orthologues of three distinct classes of novel human lncRNAs in six Vertebrata species to estimate their degree of conservation. Conclusions Being designed for the discovery of unannotated orthologues of novel lncRNAs in distant species, ortho2align is a versatile tool applicable to any genomic regions, especially weakly conserved ones. A small amount of input files makes ortho2align easy to use in orthology studies as a single tool or in bundle with other steps that researchers will consider sensible. ortho2align is available as an Anaconda package with its source code hosted at https://github.com/dmitrymyl/ortho2align. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04929-y.
Collapse
Affiliation(s)
| | - Andrey Alexandrovich Mironov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russian Federation, 119234.,Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation, 127994
| |
Collapse
|
44
|
Splicing QTL analysis focusing on coding sequences reveals mechanisms for disease susceptibility loci. Nat Commun 2022; 13:4659. [PMID: 36002455 PMCID: PMC9402578 DOI: 10.1038/s41467-022-32358-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 07/26/2022] [Indexed: 12/26/2022] Open
Abstract
Splicing quantitative trait loci (sQTLs) are one of the major causal mechanisms in genome-wide association study (GWAS) loci, but their role in disease pathogenesis is poorly understood. One reason is the complexity of alternative splicing events producing many unknown isoforms. Here, we propose two approaches, namely integration and selection, for this complexity by focusing on protein-structure of isoforms. First, we integrate isoforms with the same coding sequence (CDS) and identify 369-601 integrated-isoform ratio QTLs (i2-rQTLs), which altered protein-structure, in six immune subsets. Second, we select CDS incomplete isoforms annotated in GENCODE and identify 175-337 isoform-ratio QTL (i-rQTL). By comprehensive long-read capture RNA-sequencing among these incomplete isoforms, we reveal 29 full-length isoforms with unannotated CDSs associated with GWAS traits. Furthermore, we show that disease-causal sQTL genes can be identified by evaluating their trans-eQTL effects. Our approaches highlight the understudied role of protein-altering sQTLs and are broadly applicable to other tissues and diseases. Splicing QTL (sQTL), genetic variants regulating alternative splicing, can be biologically important, but complex to detect and interpret. Here, the authors identify sQTL by focusing on protein coding sequences, as an alternative to junction-based approaches.
Collapse
|
45
|
Ma J, Wu JY, Zhu L. Detection of orthologous exons and isoforms using EGIO. Bioinformatics 2022; 38:4474-4480. [PMID: 35946527 PMCID: PMC9525004 DOI: 10.1093/bioinformatics/btac548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/15/2022] [Accepted: 08/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Alternative splicing is an important mechanism to generate transcriptomic and phenotypic diversity. Existing methods have limited power to detect orthologous isoforms. RESULTS We develop a new method, EGIO, to detect orthologous exons and orthologous isoforms from two species. EGIO uses unique exonic regions to construct exon groups, in which process dynamic programming strategy is used to do exon alignment. EGIO could cover all the coding exons within orthologous genes. A comparison between EGIO and ExTraMapper shows that EGIO could detect more orthologous isoforms with conserved sequence and exon structures. We apply EGIO to compare human and chimpanzee protein-coding isoforms expressed in the frontal cortex and identify 6912 genes that express human unique isoforms. Unexpectedly, more human unique isoforms are detected than those conserved between humans and chimpanzees. AVAILABILITY AND IMPLEMENTATION Source code and test data of EGIO are available at https://github.com/wu-lab-egio/EGIO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinfa Ma
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jane Y Wu
- To whom correspondence should be addressed. or
| | - Li Zhu
- To whom correspondence should be addressed. or
| |
Collapse
|
46
|
Glinos DA, Garborcauskas G, Hoffman P, Ehsan N, Jiang L, Gokden A, Dai X, Aguet F, Brown KL, Garimella K, Bowers T, Costello M, Ardlie K, Jian R, Tucker NR, Ellinor PT, Harrington ED, Tang H, Snyder M, Juul S, Mohammadi P, MacArthur DG, Lappalainen T, Cummings BB. Transcriptome variation in human tissues revealed by long-read sequencing. Nature 2022; 608:353-359. [PMID: 35922509 PMCID: PMC10337767 DOI: 10.1038/s41586-022-05035-y] [Citation(s) in RCA: 87] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/28/2022] [Indexed: 12/12/2022]
Abstract
Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.
Collapse
Affiliation(s)
- Dafni A Glinos
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Garrett Garborcauskas
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | | | - Nava Ehsan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lihua Jiang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | | | | | - Kathleen L Brown
- New York Genome Center, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | | | - Tera Bowers
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Ruiqi Jian
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Nathan R Tucker
- Masonic Medical Research Institute, Utica, NY, USA
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Patrick T Ellinor
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Sissel Juul
- Oxford Nanopore Technology, New York, NY, USA
| | - Pejman Mohammadi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
- Scripps Research Translational Institute, La Jolla, CA, USA
| | - Daniel G MacArthur
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Beryl B Cummings
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
47
|
Vidal-Veuthey B, González D, Cárdenas JP. Role of microbial secreted proteins in gut microbiota-host interactions. Front Cell Infect Microbiol 2022; 12:964710. [PMID: 35967863 PMCID: PMC9373040 DOI: 10.3389/fcimb.2022.964710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 07/06/2022] [Indexed: 11/30/2022] Open
Abstract
The mammalian gut microbiota comprises a variety of commensals including potential probiotics and pathobionts, influencing the host itself. Members of the microbiota can intervene with host physiology by several mechanisms, including the secretion of a relatively well-reported set of metabolic products. Another microbiota influence mechanism is the use of secreted proteins (i.e., the secretome), impacting both the host and other community members. While widely reported and studied in pathogens, this mechanism remains understood to a lesser extent in commensals, and this knowledge is increasing in recent years. In the following minireview, we assess the current literature covering different studies, concerning the functions of secretable proteins from members of the gut microbiota (including commensals, pathobionts, and probiotics). Their effect on host physiology and health, and how these effects can be harnessed by postbiotic products, are also discussed.
Collapse
Affiliation(s)
- Boris Vidal-Veuthey
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Huechuraba, Chile
| | - Dámariz González
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Huechuraba, Chile
| | - Juan P. Cárdenas
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Huechuraba, Chile
- Escuela de Biotecnología, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
- *Correspondence: Juan P. Cárdenas,
| |
Collapse
|
48
|
Zhang R, Kuo R, Coulter M, Calixto CPG, Entizne JC, Guo W, Marquez Y, Milne L, Riegler S, Matsui A, Tanaka M, Harvey S, Gao Y, Wießner-Kroh T, Paniagua A, Crespi M, Denby K, Hur AB, Huq E, Jantsch M, Jarmolowski A, Koester T, Laubinger S, Li QQ, Gu L, Seki M, Staiger D, Sunkar R, Szweykowska-Kulinska Z, Tu SL, Wachter A, Waugh R, Xiong L, Zhang XN, Conesa A, Reddy ASN, Barta A, Kalyna M, Brown JWS. A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis. Genome Biol 2022; 23:149. [PMID: 35799267 PMCID: PMC9264592 DOI: 10.1186/s13059-022-02711-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 06/15/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. RESULTS We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. CONCLUSIONS AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.
Collapse
Affiliation(s)
- Runxuan Zhang
- Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK.
| | - Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
| | - Max Coulter
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
| | - Cristiane P G Calixto
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
- Present address: Institute of Biosciences, University of São Paulo, São Paulo, 05508-090, Brazil
| | - Juan Carlos Entizne
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
| | - Wenbin Guo
- Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
| | - Yamile Marquez
- Centre for Genomic Regulation, C/ Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Linda Milne
- Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
| | - Stefan Riegler
- Institute of Molecular Plant Biology, Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences (BOKU), Muthgasse 18, 1190, Vienna, Austria
- Present address: Institute of Science and Technology Austria, Am Campus 1, 3400, Klosterneuburg, Austria
| | - Akihiro Matsui
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Maho Tanaka
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Sarah Harvey
- Centre for Novel Agricultural Products (CNAP), Department of Biology, University of York Wentworth Way, York, YO10 5DD, UK
| | - Yubang Gao
- College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Theresa Wießner-Kroh
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, Auf der Morgenstelle 32, 72076, Tübingen, Germany
| | - Alejandro Paniagua
- Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna, Valencia, Spain
| | - Martin Crespi
- French National Centre for Scientific Research | CNRS INRAE-Universities of Paris Saclay and Paris, Institute of Plant Sciences Paris Saclay IPS2, Rue de Noetzlin, 91192, Gif sur Yvette, France
| | - Katherine Denby
- Centre for Novel Agricultural Products (CNAP), Department of Biology, University of York Wentworth Way, York, YO10 5DD, UK
| | - Asa Ben Hur
- Department of Computer Science, Colorado State University, 1873 Campus Delivery, Fort Collins, CO, 80523-1873, USA
| | - Enamul Huq
- Department of Molecular Biosciences, University of Texas at Austin, 100 East 24th St., Austin, TX, 78712-1095, USA
| | - Michael Jantsch
- Department of Cell and Developmental Biology, Center for Anatomy and Cell Biology, Medical University of Vienna, Schwarzspanierstrasse 17 A-1090, Vienna, Austria
| | - Artur Jarmolowski
- Department of Gene Expression, Adam Mickiewicz University, Poznań, Poland
| | - Tino Koester
- RNA Biology and Molecular Physiology, Faculty for Biology, Bielefeld University, Universitaetsstrasse 25, 33615, Bielefeld, Germany
| | - Sascha Laubinger
- Institut für Biologie und Umweltwissenschaften (IBU), Carl von Ossietzky Universität Oldenburg, Carl von Ossietzky-Str. 9-11, 26111, Oldenburg, Germany
- Institute of Biology, Department of Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Qingshun Quinn Li
- Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, 361102, Fujian, China
| | - Lianfeng Gu
- College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Motoaki Seki
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Dorothee Staiger
- RNA Biology and Molecular Physiology, Faculty for Biology, Bielefeld University, Universitaetsstrasse 25, 33615, Bielefeld, Germany
| | - Ramanjulu Sunkar
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA
| | | | - Shih-Long Tu
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Andreas Wachter
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, Auf der Morgenstelle 32, 72076, Tübingen, Germany
- Present address: Institute for Molecular Physiology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 17, 55128, Mainz, Germany
| | - Robbie Waugh
- Cell and Molecular Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
| | - Liming Xiong
- Department of Biology, Hong Kong Baptist University, Hong Kong, China
| | - Xiao-Ning Zhang
- Biology Department, School of Arts and Sciences, St. Bonaventure University, 3261 West State Road, St. Bonaventure, NY, 14778, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna, Valencia, Spain
| | - Anireddy S N Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Andrea Barta
- Max F. Perutz Laboratories, Medical University of Vienna, Center of Medical Biochemistry, Dr.-Bohr-Gasse 9/3, A-1030, Vienna, Austria
| | - Maria Kalyna
- Institute of Molecular Plant Biology, Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences (BOKU), Muthgasse 18, 1190, Vienna, Austria
| | - John W S Brown
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
- Cell and Molecular Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
| |
Collapse
|
49
|
The effects of sequencing depth on the assembly of coding and noncoding transcripts in the human genome. BMC Genomics 2022; 23:487. [PMID: 35787153 PMCID: PMC9251931 DOI: 10.1186/s12864-022-08717-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 06/16/2022] [Indexed: 12/30/2022] Open
Abstract
Investigating the functions and activities of genes requires proper annotation of the transcribed units. However, transcript assembly efforts have produced a surprisingly large variation in the number of transcripts, and especially so for noncoding transcripts. This heterogeneity in assembled transcript sets might be partially explained by sequencing depth. Here, we used real and simulated short-read sequencing data as well as long-read data to systematically investigate the impact of sequencing depths on the accuracy of assembled transcripts. We assembled and analyzed transcripts from 671 human short-read data sets and four long-read data sets. At the first level, there is a positive correlation between the number of reads and the number of recovered transcripts. However, the effect of the sequencing depth varied based on cell or tissue type, the type of read and the nature and expression levels of the transcripts. The detection of coding transcripts saturated rapidly with both short and long-reads, however, there was no sign of early saturation for noncoding transcripts at any sequencing depth. Increasing long-read sequencing depth specifically benefited transcripts containing transposable elements. Finally, we show how single-cell RNA-seq can be guided by transcripts assembled from bulk long-read samples, and demonstrate that noncoding transcripts are expressed at similar levels to coding transcripts but are expressed in fewer cells. This study highlights the impact of sequencing depth on transcript assembly.
Collapse
|
50
|
Legüe M. Relevancia de los mecanismos epigenéticos en el neurodesarrollo normal y consecuencias de sus perturbaciones. REVISTA MÉDICA CLÍNICA LAS CONDES 2022. [DOI: 10.1016/j.rmclc.2022.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|