1
|
Wang X, Carvajal-Moreno J, Zhao X, Li J, Hernandez VA, Yalowich JC, Elton TS. Circumvention of Topoisomerase II α Intron 19 Intronic Polyadenylation in Acquired Etoposide-Resistant Human Leukemia K562 Cells. Mol Pharmacol 2024; 106:33-46. [PMID: 38719474 PMCID: PMC11187689 DOI: 10.1124/molpharm.124.000868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/12/2024] [Indexed: 06/20/2024] Open
Abstract
DNA topoisomerase IIα (TOP2α; 170 kDa, TOP2α/170) is an essential enzyme for proper chromosome dysjunction by producing transient DNA double-stranded breaks and is an important target for DNA damage-stabilizing anticancer agents, such as etoposide. Therapeutic effects of TOP2α poisons can be limited due to acquired drug resistance. We previously demonstrated decreased TOP2α/170 levels in an etoposide-resistant human leukemia K562 subline, designated K/VP.5, accompanied by increased expression of a C-terminal truncated TOP2α isoform (90 kDa; TOP2α/90), which heterodimerized with TOP2α/170 and was a determinant of resistance by exhibiting dominant-negative effects against etoposide activity. Based on 3'-rapid amplification of cDNA ends, we confirmed TOP2α/90 as the translation product of a TOP2α mRNA in which a cryptic polyadenylation site (PAS) harbored in intron 19 (I19) was used. In this report, we investigated whether the resultant intronic polyadenylation (IPA) would be attenuated by blocking or mutating the I19 PAS, thereby circumventing acquired drug resistance. An antisense morpholino oligonucleotide was used to hybridize/block the PAS in TOP2α pre-mRNA in K/VP.5 cells, resulting in decreased TOP2α/90 mRNA/protein levels in K/VP.5 cells and partially circumventing drug resistance. Subsequently, CRISPR/CRISPR-associated protein 9 with homology-directed repair was used to mutate the cryptic I19 PAS (AATAAA→ACCCAA) to prevent IPA. Gene-edited clones exhibited increased TOP2α/170 and decreased TOP2α/90 mRNA/protein and demonstrated restored sensitivity to etoposide and other TOP2α-targeted drugs. Together, results indicated that blocking/mutating a cryptic I19 PAS in K/VP.5 cells reduced IPA and restored sensitivity to TOP2α-targeting drugs. SIGNIFICANCE STATEMENT: The results presented in this study indicate that CRISPR/CRISPR-associated protein 9 gene editing of a cryptic polyadenylation site (PAS) within I19 of the TOP2α gene results in the reversal of acquired resistance to etoposide and other TOP2-targeted drugs. An antisense morpholino oligonucleotide targeting the PAS also partially circumvented resistance.
Collapse
Affiliation(s)
- Xinyi Wang
- Division of Pharmaceutics and Pharmacology, College of Pharmacy (X.W., J.C.-M., X.Z., V.A.H., J.C.Y., T.S.E.) and Division of Outcomes and Translational Science (J.L.), The Ohio State University, Columbus, Ohio
| | - Jessika Carvajal-Moreno
- Division of Pharmaceutics and Pharmacology, College of Pharmacy (X.W., J.C.-M., X.Z., V.A.H., J.C.Y., T.S.E.) and Division of Outcomes and Translational Science (J.L.), The Ohio State University, Columbus, Ohio
| | - Xinyu Zhao
- Division of Pharmaceutics and Pharmacology, College of Pharmacy (X.W., J.C.-M., X.Z., V.A.H., J.C.Y., T.S.E.) and Division of Outcomes and Translational Science (J.L.), The Ohio State University, Columbus, Ohio
| | - Junan Li
- Division of Pharmaceutics and Pharmacology, College of Pharmacy (X.W., J.C.-M., X.Z., V.A.H., J.C.Y., T.S.E.) and Division of Outcomes and Translational Science (J.L.), The Ohio State University, Columbus, Ohio
| | - Victor A Hernandez
- Division of Pharmaceutics and Pharmacology, College of Pharmacy (X.W., J.C.-M., X.Z., V.A.H., J.C.Y., T.S.E.) and Division of Outcomes and Translational Science (J.L.), The Ohio State University, Columbus, Ohio
| | - Jack C Yalowich
- Division of Pharmaceutics and Pharmacology, College of Pharmacy (X.W., J.C.-M., X.Z., V.A.H., J.C.Y., T.S.E.) and Division of Outcomes and Translational Science (J.L.), The Ohio State University, Columbus, Ohio
| | - Terry S Elton
- Division of Pharmaceutics and Pharmacology, College of Pharmacy (X.W., J.C.-M., X.Z., V.A.H., J.C.Y., T.S.E.) and Division of Outcomes and Translational Science (J.L.), The Ohio State University, Columbus, Ohio
| |
Collapse
|
2
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024:10.1038/s12276-024-01243-w. [PMID: 38871816 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
3
|
Fu X, Rabadan R. Understanding variants of unknown significance: the computational frontier. Oncologist 2024:oyae103. [PMID: 38848164 DOI: 10.1093/oncolo/oyae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 04/16/2024] [Indexed: 06/09/2024] Open
Abstract
The rapid advancement of sequencing technologies has led to the identification of numerous mutations in cancer genomes, many of which are variants of unknown significance (VUS). Computational models are increasingly being used to predict the functional impact of these mutations, in both coding and noncoding regions. Integration of these models with emerging genomic datasets will refine our understanding of mutation effects and guide clinical decision making. Future advancements in modeling protein interactions and transcriptional regulation will further enhance our ability to interpret VUS. Periodic incorporation of these developments into VUS reclassification practice has the potential to significantly improve personalized cancer care.
Collapse
Affiliation(s)
- Xi Fu
- Columbia University Irving Medical Center, New York, NY, USA
| | - Raul Rabadan
- Columbia University Irving Medical Center, New York, NY, USA
| |
Collapse
|
4
|
Routhier E, Joubert A, Westbrook A, Pierre E, Lancrey A, Cariou M, Boulé JB, Mozziconacci J. In silico design of DNA sequences for in vivo nucleosome positioning. Nucleic Acids Res 2024:gkae468. [PMID: 38828788 DOI: 10.1093/nar/gkae468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/24/2024] [Accepted: 05/29/2024] [Indexed: 06/05/2024] Open
Abstract
The computational design of synthetic DNA sequences with designer in vivo properties is gaining traction in the field of synthetic genomics. We propose here a computational method which combines a kinetic Monte Carlo framework with a deep mutational screening based on deep learning predictions. We apply our method to build regular nucleosome arrays with tailored nucleosomal repeat lengths (NRL) in yeast. Our design was validated in vivo by successfully engineering and integrating thousands of kilobases long tandem arrays of computationally optimized sequences which could accommodate NRLs much larger than the yeast natural NRL (namely 197 and 237 bp, compared to the natural NRL of ∼165 bp). RNA-seq results show that transcription of the arrays can occur but is not driven by the NRL. The computational method proposed here delineates the key sequence rules for nucleosome positioning in yeast and should be easily applicable to other sequence properties and other genomes.
Collapse
Affiliation(s)
- Etienne Routhier
- Laboratoire de Physique Théorique, CNRS, Sorbonne Université, Paris, France de la Matière Condensée, CNRS, Sorbonne Université, Paris, France
| | - Alexandra Joubert
- Structure et Instabilité des Génomes, Museum National d'Histoire Naturelle, CNRS, INSERM, Paris, France
| | - Alex Westbrook
- Structure et Instabilité des Génomes, Museum National d'Histoire Naturelle, CNRS, INSERM, Paris, France
| | - Edgard Pierre
- Laboratoire de Physique Théorique, CNRS, Sorbonne Université, Paris, France de la Matière Condensée, CNRS, Sorbonne Université, Paris, France
| | - Astrid Lancrey
- Structure et Instabilité des Génomes, Museum National d'Histoire Naturelle, CNRS, INSERM, Paris, France
| | - Marie Cariou
- Acquisition et Analyse de données pour l'histoire naturelle, Museum National d'Histoire Naturelle, CNRS, Paris, France
| | - Jean-Baptiste Boulé
- Structure et Instabilité des Génomes, Museum National d'Histoire Naturelle, CNRS, INSERM, Paris, France
| | - Julien Mozziconacci
- Laboratoire de Physique Théorique, CNRS, Sorbonne Université, Paris, France de la Matière Condensée, CNRS, Sorbonne Université, Paris, France
- Structure et Instabilité des Génomes, Museum National d'Histoire Naturelle, CNRS, INSERM, Paris, France
- Acquisition et Analyse de données pour l'histoire naturelle, Museum National d'Histoire Naturelle, CNRS, Paris, France
- Institut Universitaire de France, Paris, France
| |
Collapse
|
5
|
Guo Y, Zhou D, Li P, Li C, Cao J. Context-Aware Poly(A) Signal Prediction Model via Deep Spatial-Temporal Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8241-8253. [PMID: 37015693 DOI: 10.1109/tnnls.2022.3226301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Polyadenylation [Poly(A)] is an essential process during messenger RNA (mRNA) maturation in biological eukaryote systems. Identifying Poly(A) signals (PASs) from the genome level is the key to understanding the mechanism of translation regulation and mRNA metabolism. In this work, we propose a deep dual-dynamic context-aware Poly(A) signal prediction model, called multiscale convolution with self-attention networks (MCANet), to adaptively uncover the spatial-temporal contextual dependence information. Specifically, the model automatically learns and strengthens informative features from the temporalwise and the spatialwise dimension. The identity connectivity performs contextual feature maps of Poly(A) data by direct connections from previous layers to subsequent layers. Then, a fully parametric rectified linear unit (FP-RELU) with dual-dynamic coefficients is devised to make the training of the model easier and enhance the generalization ability. A cross-entropy loss (CL) function is designed to make the model focus on samples that are easy to misclassify. Experiments on different Poly(A) signals demonstrate the superior performance of the proposed MCANet, and an ablation study shows the effectiveness of the network design for the feature learning and prediction of Poly(A) signals.
Collapse
|
6
|
Zhu S, Yuan S, Niu R, Zhou Y, Wang Z, Xu G. RNAirport: a deep neural network-based database characterizing representative gene models in plants. J Genet Genomics 2024; 51:652-664. [PMID: 38518981 DOI: 10.1016/j.jgg.2024.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 03/15/2024] [Accepted: 03/16/2024] [Indexed: 03/24/2024]
Abstract
A 5'-leader, known initially as the 5'-untranslated region, contains multiple isoforms due to alternative splicing (aS) and alternative transcription start site (aTSS). Therefore, a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency. Here, we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species. We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader. To further assign a representative 5'-end, we train the deep-learning model 5'leaderP to learn aTSS-mediated 5'-end distribution patterns from cap-analysis gene expression data. The model accurately predicts the 5'-end, confirmed experimentally in Arabidopsis and rice. The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport (http://www.rnairport.com/leader5P/). The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation, identical to the project recently initiated by human GENCODE.
Collapse
Affiliation(s)
- Sitao Zhu
- State Key Laboratory of Hybrid Rice, Institute for Advanced Studies (IAS), Wuhan University, Wuhan, Hubei 430072, China
| | - Shu Yuan
- State Key Laboratory of Hybrid Rice, Institute for Advanced Studies (IAS), Wuhan University, Wuhan, Hubei 430072, China
| | - Ruixia Niu
- State Key Laboratory of Hybrid Rice, Institute for Advanced Studies (IAS), Wuhan University, Wuhan, Hubei 430072, China
| | - Yulu Zhou
- State Key Laboratory of Hybrid Rice, Institute for Advanced Studies (IAS), Wuhan University, Wuhan, Hubei 430072, China
| | - Zhao Wang
- State Key Laboratory of Hybrid Rice, Institute for Advanced Studies (IAS), Wuhan University, Wuhan, Hubei 430072, China
| | - Guoyong Xu
- State Key Laboratory of Hybrid Rice, Institute for Advanced Studies (IAS), Wuhan University, Wuhan, Hubei 430072, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China.
| |
Collapse
|
7
|
Hu X, Cao P, Wang F, Wang T, Duan J, Chen X, Ma X, Zhang Y, Chen J, Liu H, Zhang H, Wu X. Alternative polyadenylation quantitative trait loci contribute to acute myeloid leukemia risk genes regulation. Leuk Res 2024; 141:107499. [PMID: 38640632 DOI: 10.1016/j.leukres.2024.107499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 03/14/2024] [Accepted: 04/08/2024] [Indexed: 04/21/2024]
Abstract
Acute myeloid leukemia (AML) is a hematopoietic malignancy with a high relapse rate and progressive drug resistance. Alternative polyadenylation (APA) contributes to post-transcriptional dysregulation, but little is known about the association between APA and AML. The APA quantitative trait locus (apaQTL) is a powerful method to investigate the relationship between APA and single nucleotide polymorphisms (SNPs). We quantified APA usage in 195 Chinese AML patients and identified 4922 cis-apaQTLs related to 1875 genes, most of which were newly reported. Cis-apaQTLs may modulate the APA selection of 115 genes through poly(A) signals. Colocalization analysis revealed that cis-apaQTLs colocalized with cis-eQTLs may regulate gene expression by affecting miRNA binding sites or RNA secondary structures. We discovered 207 cis-apaQTLs related to AML risk by comparing genotype frequency with the East Asian healthy controls from the 1000 Genomes Project. Genes with cis-apaQTLs were associated with hematological phenotypes and tumor incidence according to the PHARMGKB and MGI databases. Collectively, we profiled an atlas of cis-apaQTLs in Asian AML patients and found their association with APA selection, gene expression, AML risk, and complex traits. Cis-apaQTLs may provide insights into the regulatory mechanisms related to APA in AML occurrence, progression, and prognosis.
Collapse
Affiliation(s)
- Xi Hu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Panxiang Cao
- Division of Pathology and Laboratory Medicine, Hebei Yanda Lu Daopei Hospital, Langfang 065201, China
| | - Fang Wang
- Division of Pathology and Laboratory Medicine, Hebei Yanda Lu Daopei Hospital, Langfang 065201, China
| | - Tong Wang
- Division of Pathology and Laboratory Medicine, Hebei Yanda Lu Daopei Hospital, Langfang 065201, China
| | - Junbo Duan
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xue Chen
- Division of Pathology and Laboratory Medicine, Hebei Yanda Lu Daopei Hospital, Langfang 065201, China
| | - Xiaoli Ma
- Division of Pathology and Laboratory Medicine, Hebei Yanda Lu Daopei Hospital, Langfang 065201, China
| | - Yang Zhang
- Division of Pathology and Laboratory Medicine, Hebei Yanda Lu Daopei Hospital, Langfang 065201, China
| | - Jiaqi Chen
- Division of Pathology and Laboratory Medicine, Hebei Yanda Lu Daopei Hospital, Langfang 065201, China
| | - Hongxing Liu
- Division of Pathology and Laboratory Medicine, Hebei Yanda Lu Daopei Hospital, Langfang 065201, China.
| | - Huqin Zhang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Xiaoming Wu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
| |
Collapse
|
8
|
Lin J, Wang X, Liu T, Teng Y, Cui W. Diffusion-Based Generative Network for de Novo Synthetic Promoter Design. ACS Synth Biol 2024; 13:1513-1522. [PMID: 38613497 DOI: 10.1021/acssynbio.4c00041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2024]
Abstract
Computer-aided promoter design is a major development trend in synthetic promoter engineering. Various deep learning models have been used to evaluate or screen synthetic promoters, but there have been few works on de novo promoter design. To explore the potential ability of generative models in promoter design, we established a diffusion-based generative model for promoter design in Escherichia coli. The model was completely driven by sequence data and could study the essential characteristics of natural promoters, thus generating synthetic promoters similar to natural promoters in structure and component. We also improved the calculation method of FID indicator, using a convolution layer to extract the feature matrix of the promoter sequence instead. As a result, we got an FID equal to 1.37, which meant synthetic promoters have a distribution similar to that of natural ones. Our work provides a fresh approach to de novo promoter design, indicating that a completely data-driven generative model is feasible for promoter design.
Collapse
Affiliation(s)
- Jianfeng Lin
- School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
| | - Xin Wang
- School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
| | - Tuoyu Liu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Yue Teng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Wei Cui
- School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
| |
Collapse
|
9
|
Bak M, van Nimwegen E, Kouzel IU, Gur T, Schmidt R, Zavolan M, Gruber AJ. MAPP unravels frequent co-regulation of splicing and polyadenylation by RNA-binding proteins and their dysregulation in cancer. Nat Commun 2024; 15:4110. [PMID: 38750024 PMCID: PMC11096328 DOI: 10.1038/s41467-024-48046-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 04/15/2024] [Indexed: 05/18/2024] Open
Abstract
Maturation of eukaryotic pre-mRNAs via splicing and polyadenylation is modulated across cell types and conditions by a variety of RNA-binding proteins (RBPs). Although there exist over 1,500 RBPs in human cells, their binding motifs and functions still remain to be elucidated, especially in the complex environment of tissues and in the context of diseases. To overcome the lack of methods for the systematic and automated detection of sequence motif-guided pre-mRNA processing regulation from RNA sequencing (RNA-Seq) data we have developed MAPP (Motif Activity on Pre-mRNA Processing). Applying MAPP to RBP knock-down experiments reveals that many RBPs regulate both splicing and polyadenylation of nascent transcripts by acting on similar sequence motifs. MAPP not only infers these sequence motifs, but also unravels the position-dependent impact of the RBPs on pre-mRNA processing. Interestingly, all investigated RBPs that act on both splicing and 3' end processing exhibit a consistently repressive or activating effect on both processes, providing a first glimpse on the underlying mechanism. Applying MAPP to normal and malignant brain tissue samples unveils that the motifs bound by the PTBP1 and RBFOX RBPs coordinately drive the oncogenic splicing program active in glioblastomas demonstrating that MAPP paves the way for characterizing pre-mRNA processing regulators under physiological and pathological conditions.
Collapse
Affiliation(s)
- Maciej Bak
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
- Biozentrum, University of Basel, 4056, Basel, Switzerland
| | - Erik van Nimwegen
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
- Biozentrum, University of Basel, 4056, Basel, Switzerland
| | - Ian U Kouzel
- Department of Biology, University of Konstanz, D-78464, Konstanz, Germany
| | - Tamer Gur
- Department of Biology, University of Konstanz, D-78464, Konstanz, Germany
| | - Ralf Schmidt
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
- Biozentrum, University of Basel, 4056, Basel, Switzerland
| | - Mihaela Zavolan
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
- Biozentrum, University of Basel, 4056, Basel, Switzerland
| | - Andreas J Gruber
- Department of Biology, University of Konstanz, D-78464, Konstanz, Germany.
| |
Collapse
|
10
|
Fansler MM, Mitschka S, Mayr C. Quantifying 3'UTR length from scRNA-seq data reveals changes independent of gene expression. Nat Commun 2024; 15:4050. [PMID: 38744866 PMCID: PMC11094166 DOI: 10.1038/s41467-024-48254-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 04/22/2024] [Indexed: 05/16/2024] Open
Abstract
Although more than half of all genes generate transcripts that differ in 3'UTR length, current analysis pipelines only quantify the amount but not the length of mRNA transcripts. 3'UTR length is determined by 3' end cleavage sites (CS). We map CS in more than 200 primary human and mouse cell types and increase CS annotations relative to the GENCODE database by 40%. Approximately half of all CS are used in few cell types, revealing that most genes only have one or two major 3' ends. We incorporate the CS annotations into a computational pipeline, called scUTRquant, for rapid, accurate, and simultaneous quantification of gene and 3'UTR isoform expression from single-cell RNA sequencing (scRNA-seq) data. When applying scUTRquant to data from 474 cell types and 2134 perturbations, we discover extensive 3'UTR length changes across cell types that are as widespread and coordinately regulated as gene expression changes but affect mostly different genes. Our data indicate that mRNA abundance and mRNA length are two largely independent axes of gene regulation that together determine the amount and spatial organization of protein synthesis.
Collapse
Affiliation(s)
- Mervin M Fansler
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Graduate College, New York, NY, 10021, USA
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Sibylle Mitschka
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Christine Mayr
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Graduate College, New York, NY, 10021, USA.
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
| |
Collapse
|
11
|
Cheng X, Jiang G, Zhou X, Wang J, Zhao Z, Zhang J, Ni T. The landscape and clinical relevance of intronic polyadenylation in human cancers. J Genet Genomics 2024:S1673-8527(24)00099-7. [PMID: 38740258 DOI: 10.1016/j.jgg.2024.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/07/2024] [Accepted: 04/25/2024] [Indexed: 05/16/2024]
Abstract
Intronic polyadenylation (IPA) is an RNA 3' end processing event which has been reported to play important roles in cancer development. However, the comprehensive landscape of IPA events across various cancer types is lacking. Here, we apply IPAFinder to identify and quantify IPA events in 10,383 samples covering all 33 cancer types from The Cancer Genome Atlas (TCGA) project. We totally identify 21,835 IPA events, almost half of which are ubiquitously expressed. We identify 2,761 unique dynamically changed IPA events across cancer types. Furthermore, we observe 8,855 non-redundant clinically relevant IPA events, which could potentially be used as prognostic indicators. Our analysis also reveals that dynamic IPA usage within cancer signaling pathways may affect drug response. Finally, we develop a user-friendly data portal, IPACancer Atlas (http://www.tingni-lab.com/Pancan_IPA/), to search and explore IPAs in cancer.
Collapse
Affiliation(s)
- Xiaomeng Cheng
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Guanghui Jiang
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xiaolan Zhou
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Jing Wang
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Zhaozhao Zhao
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China; MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Jiayu Zhang
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Ting Ni
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China; State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot, Inner Mongolia 010070, China.
| |
Collapse
|
12
|
Dudnyk K, Cai D, Shi C, Xu J, Zhou J. Sequence basis of transcription initiation in the human genome. Science 2024; 384:eadj0116. [PMID: 38662817 DOI: 10.1126/science.adj0116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 02/28/2024] [Indexed: 05/03/2024]
Abstract
Transcription initiation is a process that is essential to ensuring the proper function of any gene, yet we still lack a unified understanding of sequence patterns and rules that explain most transcription start sites in the human genome. By predicting transcription initiation at base-pair resolution from sequences with a deep learning-inspired explainable model called Puffin, we show that a small set of simple rules can explain transcription initiation at most human promoters. We identify key sequence patterns that contribute to human promoter activity, each activating transcription with distinct position-specific effects. Furthermore, we explain the sequence basis of bidirectional transcription at promoters, identify the links between promoter sequence and gene expression variation across cell types, and explore the conservation of sequence determinants of transcription initiation across mammalian species.
Collapse
Affiliation(s)
- Kseniia Dudnyk
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Donghong Cai
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Center of Excellence for Leukemia Studies (CELS), Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Chenlai Shi
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jian Xu
- Center of Excellence for Leukemia Studies (CELS), Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
13
|
He AY, Danko CG. Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.13.583868. [PMID: 38559255 PMCID: PMC10979970 DOI: 10.1101/2024.03.13.583868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Our understanding of how the DNA sequences of cis-regulatory elements encode transcription initiation patterns remains limited. Here we introduce CLIPNET, a deep learning model trained on population-scale PRO-cap data that accurately predicts the position and quantity of transcription initiation with single nucleotide resolution from DNA sequence. Interpretation of CLIPNET revealed a complex regulatory syntax consisting of DNA-protein interactions in five major positions between -200 and +50 bp relative to the transcription start site, as well as more subtle positional preferences among different transcriptional activators. Transcriptional activator and core promoter motifs occupy different positions and play distinct roles in regulating initiation, with the former driving initiation quantity and the latter initiation position. We identified core promoter motifs that explain initiation patterns in the majority of promoters and enhancers, including DPR motifs and AT-rich TBP binding sequences in TATA-less promoters. Our results provide insights into the sequence architecture governing transcription initiation.
Collapse
Affiliation(s)
- Adam Y. He
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University
- Graduate Field of Computational Biology, Cornell University
| | - Charles G. Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University
| |
Collapse
|
14
|
Lughmani H, Patel H, Chakravarti R. Structural Features and Physiological Associations of Human 14-3-3ζ Pseudogenes. Genes (Basel) 2024; 15:399. [PMID: 38674334 PMCID: PMC11049341 DOI: 10.3390/genes15040399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 04/28/2024] Open
Abstract
There are about 14,000 pseudogenes that are mutated or truncated sequences resembling functional parent genes. About two-thirds of pseudogenes are processed, while others are duplicated. Although initially thought dead, emerging studies indicate they have functional and regulatory roles. We study 14-3-3ζ, an adaptor protein that regulates cytokine signaling and inflammatory diseases, including rheumatoid arthritis, cancer, and neurological disorders. To understand how 14-3-3ζ (gene symbol YWHAZ) performs diverse functions, we examined the human genome and identified nine YWHAZ pseudogenes spread across many chromosomes. Unlike the 32 kb exon-to-exon sequence in YWHAZ, all pseudogenes are much shorter and lack introns. Out of six, four YWHAZ exons are highly conserved, but the untranslated region (UTR) shows significant diversity. The putative amino acid sequence of pseudogenes is 78-97% homologous, resulting in striking structural similarities with the parent protein. The OMIM and Decipher database searches revealed chromosomal loci containing pseudogenes are associated with human diseases that overlap with the parent gene. To the best of our knowledge, this is the first report on pseudogenes of the 14-3-3 family protein and their implications for human health. This bioinformatics-based study introduces a new insight into the complexity of 14-3-3ζ's functions in biology.
Collapse
Affiliation(s)
| | | | - Ritu Chakravarti
- Department of Physiology and Pharmacology, The University of Toledo, Toledo, OH 43614, USA; (H.L.); (H.P.)
| |
Collapse
|
15
|
Liu X, Chen H, Li Z, Yang X, Jin W, Wang Y, Zheng J, Li L, Xuan C, Yuan J, Yang Y. InPACT: a computational method for accurate characterization of intronic polyadenylation from RNA sequencing data. Nat Commun 2024; 15:2583. [PMID: 38519498 PMCID: PMC10960005 DOI: 10.1038/s41467-024-46875-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 03/12/2024] [Indexed: 03/25/2024] Open
Abstract
Alternative polyadenylation can occur in introns, termed intronic polyadenylation (IPA), has been implicated in diverse biological processes and diseases, as it can produce noncoding transcripts or transcripts with truncated coding regions. However, a reliable method is required to accurately characterize IPA. Here, we propose a computational method called InPACT, which allows for the precise characterization of IPA from conventional RNA-seq data. InPACT successfully identifies numerous previously unannotated IPA transcripts in human cells, many of which are translated, as evidenced by ribosome profiling data. We have demonstrated that InPACT outperforms other methods in terms of IPA identification and quantification. Moreover, InPACT applied to monocyte activation reveals temporally coordinated IPA events. Further application on single-cell RNA-seq data of human fetal bone marrow reveals the expression of several IPA isoforms in a context-specific manner. Therefore, InPACT represents a powerful tool for the accurate characterization of IPA from RNA-seq data.
Collapse
Affiliation(s)
- Xiaochuan Liu
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Hao Chen
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Zekun Li
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Xiaoxiao Yang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Wen Jin
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Yuting Wang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Jian Zheng
- Department of Immunology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Long Li
- Department of Immunology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Chenghao Xuan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
| | - Jiapei Yuan
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020, China.
- Tianjin Institutes of Health Science, Tianjin, 301600, China.
| | - Yang Yang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
| |
Collapse
|
16
|
Kim YA, Mousavi K, Yazdi A, Zwierzyna M, Cardinali M, Fox D, Peel T, Coller J, Aggarwal K, Maruggi G. Computational design of mRNA vaccines. Vaccine 2024; 42:1831-1840. [PMID: 37479613 DOI: 10.1016/j.vaccine.2023.07.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/23/2023] [Accepted: 07/10/2023] [Indexed: 07/23/2023]
Abstract
mRNA technology has emerged as a successful vaccine platform that offered a swift response to the COVID-19 pandemic. Accumulating evidence shows that vaccine efficacy, thermostability, and other important properties, are largely impacted by intrinsic properties of the mRNA molecule, such as RNA sequence and structure, both of which can be optimized. Designing mRNA sequence for vaccines presents a combinatorial problem due to an extremely large selection space. For instance, due to the degeneracy of the genetic code, there are over 10632 possible mRNA sequences that could encode the spike protein, the COVID-19 vaccines' target. Moreover, designing different elements of the mRNA sequence simultaneously against multiple objectives such as translational efficiency, reduced reactogenicity, and improved stability requires an efficient and sophisticated optimization strategy. Recently, there has been a growing interest in utilizing computational tools to redesign mRNA sequences to improve vaccine characteristics and expedite discovery timelines. In this review, we explore important biophysical features of mRNA to be considered for vaccine design and discuss how computational approaches can be applied to rapidly design mRNA sequences with desirable characteristics.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Jeff Coller
- Johns Hopkins University, Baltimore, MD, USA
| | | | | |
Collapse
|
17
|
Luthra I, Jensen C, Chen XE, Salaudeen AL, Rafi AM, de Boer CG. Regulatory activity is the default DNA state in eukaryotes. Nat Struct Mol Biol 2024; 31:559-567. [PMID: 38448573 DOI: 10.1038/s41594-024-01235-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.
Collapse
Affiliation(s)
- Ishika Luthra
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cassandra Jensen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xinyi E Chen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Asfar Lathif Salaudeen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Abdul Muntakim Rafi
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
18
|
Fannjiang C, Listgarten J. Is Novelty Predictable? Cold Spring Harb Perspect Biol 2024; 16:a041469. [PMID: 38052497 PMCID: PMC10835614 DOI: 10.1101/cshperspect.a041469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Machine learning-based design has gained traction in the sciences, most notably in the design of small molecules, materials, and proteins, with societal applications ranging from drug development and plastic degradation to carbon sequestration. When designing objects to achieve novel property values with machine learning, one faces a fundamental challenge: how to push past the frontier of current knowledge, distilled from the training data into the model, in a manner that rationally controls the risk of failure. If one trusts learned models too much in extrapolation, one is likely to design rubbish. In contrast, if one does not extrapolate, one cannot find novelty. Herein, we ponder how one might strike a useful balance between these two extremes. We focus in particular on designing proteins with novel property values, although much of our discussion is relevant to machine learning-based design more broadly.
Collapse
Affiliation(s)
- Clara Fannjiang
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| | - Jennifer Listgarten
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| |
Collapse
|
19
|
Bose R, Saleem I, Mustoe AM. Causes, functions, and therapeutic possibilities of RNA secondary structure ensembles and alternative states. Cell Chem Biol 2024; 31:17-35. [PMID: 38199037 PMCID: PMC10842484 DOI: 10.1016/j.chembiol.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/21/2023] [Accepted: 12/12/2023] [Indexed: 01/12/2024]
Abstract
RNA secondary structure plays essential roles in encoding RNA regulatory fate and function. Most RNAs populate ensembles of alternatively paired states and are continually unfolded and refolded by cellular processes. Measuring these structural ensembles and their contributions to cellular function has traditionally posed major challenges, but new methods and conceptual frameworks are beginning to fill this void. In this review, we provide a mechanism- and function-centric compendium of the roles of RNA secondary structural ensembles and minority states in regulating the RNA life cycle, from transcription to degradation. We further explore how dysregulation of RNA structural ensembles contributes to human disease and discuss the potential of drugging alternative RNA states to therapeutically modulate RNA activity. The emerging paradigm of RNA structural ensembles as central to RNA function provides a foundation for a deeper understanding of RNA biology and new therapeutic possibilities.
Collapse
Affiliation(s)
- Ritwika Bose
- Therapeutic Innovation Center (THINC), Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, USA
| | - Irfana Saleem
- Therapeutic Innovation Center (THINC), Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, USA
| | - Anthony M Mustoe
- Therapeutic Innovation Center (THINC), Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
20
|
Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res 2024; 52:D1143-D1154. [PMID: 38183205 PMCID: PMC10767851 DOI: 10.1093/nar/gkad989] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/14/2023] [Accepted: 10/17/2023] [Indexed: 01/07/2024] Open
Abstract
Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Thorben Maass
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Röner
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| |
Collapse
|
21
|
de Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024; 625:41-50. [PMID: 38093018 DOI: 10.1038/s41586-023-06661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/20/2023] [Indexed: 01/05/2024]
Abstract
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
Collapse
Affiliation(s)
- Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Jussi Taipale
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
22
|
Liu L, Yu AM, Wang X, Soles LV, Teng X, Chen Y, Yoon Y, Sarkan KSK, Valdez MC, Linder J, England W, Spitale R, Yu Z, Marazzi I, Qiao F, Li W, Seelig G, Shi Y. The anticancer compound JTE-607 reveals hidden sequence specificity of the mRNA 3' processing machinery. Nat Struct Mol Biol 2023; 30:1947-1957. [PMID: 38087090 DOI: 10.1038/s41594-023-01161-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 10/24/2023] [Indexed: 12/18/2023]
Abstract
JTE-607 is an anticancer and anti-inflammatory compound and its active form, compound 2, directly binds to and inhibits CPSF73, the endonuclease for the cleavage step in pre-messenger RNA (pre-mRNA) 3' processing. Surprisingly, compound 2-mediated inhibition of pre-mRNA cleavage is sequence specific and the drug sensitivity is predominantly determined by sequences flanking the cleavage site (CS). Using massively parallel in vitro assays, we identified key sequence features that determine drug sensitivity. We trained a machine learning model that can predict poly(A) site (PAS) relative sensitivity to compound 2 and provide the molecular basis for understanding the impact of JTE-607 on PAS selection and transcription termination genome wide. We propose that CPSF73 and associated factors bind to the CS region in a sequence-dependent manner and the interaction affinity determines compound 2 sensitivity. These results have not only elucidated the mechanism of action of JTE-607, but also unveiled an evolutionarily conserved sequence specificity of the mRNA 3' processing machinery.
Collapse
Affiliation(s)
- Liang Liu
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, CA, USA
- Center for Virus Research, University of California, Irvine, Irvine, CA, USA
| | - Angela M Yu
- Department of Electrical and Computer Engineering, University of Washington, Seattle, Seattle, WA, USA
| | - Xiuye Wang
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, CA, USA
- Guangzhou Laboratory, Guangdong, China
| | - Lindsey V Soles
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Xueyi Teng
- Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Yiling Chen
- Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Kristianna S K Sarkan
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Marielle Cárdenas Valdez
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Johannes Linder
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Whitney England
- Department of Pharmaceutical Sciences, University of California Irvine, Irvine, CA, USA
| | - Robert Spitale
- Department of Pharmaceutical Sciences, University of California Irvine, Irvine, CA, USA
- Department of Chemistry, University of California, Irvine, Irvine, CA, USA
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, USA
| | - Zhaoxia Yu
- Department of Statistics, University of California, Irvine, Irvine, CA, USA
| | - Ivan Marazzi
- Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Feng Qiao
- Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Wei Li
- Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Georg Seelig
- Department of Electrical and Computer Engineering, University of Washington, Seattle, Seattle, WA, USA.
- Paul G Allen School of Computer Science and Engineering, University of Washington, Seattle, Seattle, WA, USA.
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, CA, USA.
- Center for Virus Research, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
23
|
Parthiban S, Vijeesh T, Gayathri T, Shanmugaraj B, Sharma A, Sathishkumar R. Artificial intelligence-driven systems engineering for next-generation plant-derived biopharmaceuticals. FRONTIERS IN PLANT SCIENCE 2023; 14:1252166. [PMID: 38034587 PMCID: PMC10684705 DOI: 10.3389/fpls.2023.1252166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023]
Abstract
Recombinant biopharmaceuticals including antigens, antibodies, hormones, cytokines, single-chain variable fragments, and peptides have been used as vaccines, diagnostics and therapeutics. Plant molecular pharming is a robust platform that uses plants as an expression system to produce simple and complex recombinant biopharmaceuticals on a large scale. Plant system has several advantages over other host systems such as humanized expression, glycosylation, scalability, reduced risk of human or animal pathogenic contaminants, rapid and cost-effective production. Despite many advantages, the expression of recombinant proteins in plant system is hindered by some factors such as non-human post-translational modifications, protein misfolding, conformation changes and instability. Artificial intelligence (AI) plays a vital role in various fields of biotechnology and in the aspect of plant molecular pharming, a significant increase in yield and stability can be achieved with the intervention of AI-based multi-approach to overcome the hindrance factors. Current limitations of plant-based recombinant biopharmaceutical production can be circumvented with the aid of synthetic biology tools and AI algorithms in plant-based glycan engineering for protein folding, stability, viability, catalytic activity and organelle targeting. The AI models, including but not limited to, neural network, support vector machines, linear regression, Gaussian process and regressor ensemble, work by predicting the training and experimental data sets to design and validate the protein structures thereby optimizing properties such as thermostability, catalytic activity, antibody affinity, and protein folding. This review focuses on, integrating systems engineering approaches and AI-based machine learning and deep learning algorithms in protein engineering and host engineering to augment protein production in plant systems to meet the ever-expanding therapeutics market.
Collapse
Affiliation(s)
- Subramanian Parthiban
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Thandarvalli Vijeesh
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Thashanamoorthi Gayathri
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Balamurugan Shanmugaraj
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Ashutosh Sharma
- Tecnologico de Monterrey, School of Engineering and Sciences, Centre of Bioengineering, Queretaro, Mexico
| | - Ramalingam Sathishkumar
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| |
Collapse
|
24
|
Stroup EK, Ji Z. Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease. Nat Commun 2023; 14:7378. [PMID: 37968271 PMCID: PMC10651852 DOI: 10.1038/s41467-023-43266-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 11/05/2023] [Indexed: 11/17/2023] Open
Abstract
The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases.
Collapse
Affiliation(s)
- Emily Kunce Stroup
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Zhe Ji
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA.
- Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL, 60628, USA.
| |
Collapse
|
25
|
Stroup EK, Ji Z. Delineating yeast cleavage and polyadenylation signals using deep learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.10.561764. [PMID: 37873420 PMCID: PMC10592759 DOI: 10.1101/2023.10.10.561764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
3'-end cleavage and polyadenylation is an essential process for eukaryotic mRNA maturation. In yeast species, the polyadenylation signals that recruit the processing machinery are degenerate and remain poorly characterized compared to well-defined regulatory elements in mammals. Especially, recent deep sequencing experiments showed extensive cleavage heterogeneity for some mRNAs in Saccharomyces cerevisiae and uncovered the polyA motif differences between S. cerevisiae vs. Schizosaccharomyces pombe . The findings raised the fundamental question of how polyadenylation signals are formed in yeast. Here we addressed this question by developing deep learning models to deconvolute degenerate cis -regulatory elements and quantify their positional importance in mediating yeast polyA site formation, cleavage heterogeneity, and strength. In S. cerevisiae , cleavage heterogeneity is promoted by the depletion of U-rich elements around polyA sites as well as multiple occurrences of upstream UA-rich elements. Sites with high cleavage heterogeneity show overall lower strength. The site strength and tandem site distances modulate alternative polyadenylation (APA) under the diauxic stress. Finally, we developed a deep learning model to reveal the distinct motif configuration of S. pombe polyA sites which show more precise cleavage than S. cerevisiae . Altogether, our deep learning models provide unprecedented insights into polyA site formation across yeast species.
Collapse
|
26
|
Gosai SJ, Castro RI, Fuentes N, Butts JC, Kales S, Noche RR, Mouri K, Sabeti PC, Reilly SK, Tewhey R. Machine-guided design of synthetic cell type-specific cis-regulatory elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.08.552077. [PMID: 37609287 PMCID: PMC10441439 DOI: 10.1101/2023.08.08.552077] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing, and stimulus responses, which collectively define the thousands of unique cell types in the body. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for an intended purpose has arisen naturally through evolution. Here, we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell type specificity. We leverage innovations in deep neural network modeling of CRE activity across three cell types, efficient in silico optimization, and massively parallel reporter assays (MPRAs) to design and empirically test thousands of CREs. Through in vitro and in vivo validation, we show that synthetic sequences outperform natural sequences from the human genome in driving cell type-specific expression. Synthetic sequences leverage unique sequence syntax to promote activity in the on-target cell type and simultaneously reduce activity in off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs and demonstrate the required literacy to write regulatory code that is fit-for-purpose in vivo across vertebrates.
Collapse
Affiliation(s)
- SJ Gosai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Graduate Program in Biological and Biomedical Science, Boston MA
- Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - RI Castro
- The Jackson Laboratory, Bar Harbor, ME, USA
| | - N Fuentes
- The Jackson Laboratory, Bar Harbor, ME, USA
- Harvard College, Harvard University, Cambridge, MA, USA
| | - JC Butts
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
| | - S Kales
- The Jackson Laboratory, Bar Harbor, ME, USA
| | - RR Noche
- Department of Comparative Medicine, Yale School of Medicine, New Haven, CT, USA
- Yale Zebrafish Research Core, Yale School of Medicine, New Haven, CT, USA
| | - K Mouri
- The Jackson Laboratory, Bar Harbor, ME, USA
| | - PC Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - SK Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| | - R Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
27
|
Zhang XE, Liu C, Dai J, Yuan Y, Gao C, Feng Y, Wu B, Wei P, You C, Wang X, Si T. Enabling technology and core theory of synthetic biology. SCIENCE CHINA. LIFE SCIENCES 2023; 66:1742-1785. [PMID: 36753021 PMCID: PMC9907219 DOI: 10.1007/s11427-022-2214-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/04/2022] [Indexed: 02/09/2023]
Abstract
Synthetic biology provides a new paradigm for life science research ("build to learn") and opens the future journey of biotechnology ("build to use"). Here, we discuss advances of various principles and technologies in the mainstream of the enabling technology of synthetic biology, including synthesis and assembly of a genome, DNA storage, gene editing, molecular evolution and de novo design of function proteins, cell and gene circuit engineering, cell-free synthetic biology, artificial intelligence (AI)-aided synthetic biology, as well as biofoundries. We also introduce the concept of quantitative synthetic biology, which is guiding synthetic biology towards increased accuracy and predictability or the real rational design. We conclude that synthetic biology will establish its disciplinary system with the iterative development of enabling technologies and the maturity of the core theory.
Collapse
Affiliation(s)
- Xian-En Zhang
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Chenli Liu
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Junbiao Dai
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Yingjin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Caixia Gao
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Bian Wu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Ping Wei
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Chun You
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Tong Si
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
28
|
Riley AT, Robson JM, Green AA. Generative and predictive neural networks for the design of functional RNA molecules. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549043. [PMID: 37503279 PMCID: PMC10370010 DOI: 10.1101/2023.07.14.549043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
RNA is a remarkably versatile molecule that has been engineered for applications in therapeutics, diagnostics, and in vivo information-processing systems. However, the complex relationship between the sequence and structural properties of an RNA molecule and its ability to perform specific functions often necessitates extensive experimental screening of candidate sequences. Here we present a generalized neural network architecture that utilizes the sequence and structure of RNA molecules (SANDSTORM) to inform functional predictions. We demonstrate that this approach achieves state-of-the-art performance across several distinct RNA prediction tasks, while learning interpretable abstractions of RNA secondary structure. We paired these predictive models with generative adversarial RNA design networks (GARDN), allowing the generative modelling of novel mRNA 5' untranslated regions and toehold switch riboregulators exhibiting a predetermined fitness. This approach enabled the design of novel toehold switches with a 43-fold increase in experimentally characterized dynamic range compared to those designed using classic thermodynamic algorithms. SANDSTORM and GARDN thus represent powerful new predictive and generative tools for the development of diagnostic and therapeutic RNA molecules with improved function.
Collapse
Affiliation(s)
- Aidan T. Riley
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| | - James M. Robson
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| | - Alexander A. Green
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
- Molecular Biology, Cell Biology & Biochemistry Program, Graduate School of Arts and Sciences, Boston University, Boston, MA 02215, USA
| |
Collapse
|
29
|
Wang Z, Zou J, Zhang L, Liu H, Jiang B, Liang Y, Zhang Y. Comprehensive analysis of the progression mechanisms of CRPC and its inhibitor discovery based on machine learning algorithms. Front Genet 2023; 14:1184704. [PMID: 37476415 PMCID: PMC10354439 DOI: 10.3389/fgene.2023.1184704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 06/27/2023] [Indexed: 07/22/2023] Open
Abstract
Background: Almost all patients treated with androgen deprivation therapy (ADT) eventually develop castration-resistant prostate cancer (CRPC). Our research aims to elucidate the potential biomarkers and molecular mechanisms that underlie the transformation of primary prostate cancer into CRPC. Methods: We collected three microarray datasets (GSE32269, GSE74367, and GSE66187) from the Gene Expression Omnibus (GEO) database for CRPC. Differentially expressed genes (DEGs) in CRPC were identified for further analyses, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and gene set enrichment analysis (GSEA). Weighted gene coexpression network analysis (WGCNA) and two machine learning algorithms were employed to identify potential biomarkers for CRPC. The diagnostic efficiency of the selected biomarkers was evaluated based on gene expression level and receiver operating characteristic (ROC) curve analyses. We conducted virtual screening of drugs using AutoDock Vina. In vitro experiments were performed using the Cell Counting Kit-8 (CCK-8) assay to evaluate the inhibitory effects of the drugs on CRPC cell viability. Scratch and transwell invasion assays were employed to assess the effects of the drugs on the migration and invasion abilities of prostate cancer cells. Results: Overall, a total of 719 DEGs, consisting of 513 upregulated and 206 downregulated genes, were identified. The biological functional enrichment analysis indicated that DEGs were mainly enriched in pathways related to the cell cycle and metabolism. CCNA2 and CKS2 were identified as promising biomarkers using a combination of WGCNA, LASSO logistic regression, SVM-RFE, and Venn diagram analyses. These potential biomarkers were further validated and exhibited a strong predictive ability. The results of the virtual screening revealed Aprepitant and Dolutegravir as the optimal targeted drugs for CCNA2 and CKS2, respectively. In vitro experiments demonstrated that both Aprepitant and Dolutegravir exerted significant inhibitory effects on CRPC cells (p < 0.05), with Aprepitant displaying a superior inhibitory effect compared to Dolutegravir. Discussion: The expression of CCNA2 and CKS2 increases with the progression of prostate cancer, which may be one of the driving factors for the progression of prostate cancer and can serve as diagnostic biomarkers and therapeutic targets for CRPC. Additionally, Aprepitant and Dolutegravir show potential as anti-tumor drugs for CRPC.
Collapse
Affiliation(s)
- Zhen Wang
- College of Basic Medical Sciences, Dali University, Dali, Yunnan, China
| | - Jing Zou
- The First Affiliated Hospital of Dali University, Dali, Yunnan, China
| | - Le Zhang
- College of Basic Medical Sciences, Dali University, Dali, Yunnan, China
| | - Hongru Liu
- College of Basic Medical Sciences, Dali University, Dali, Yunnan, China
| | - Bei Jiang
- Yunnan Key Laboratory of Screening and Research on Anti-pathogenic Plant Resources from West Yunnan (Cultivation), Dali, Yunnan, China
| | - Yi Liang
- Princess Margaret Cancer Centre, TMDT-MaRS Centre, University Health Network, Toronto, ON, Canada
| | - Yuzhe Zhang
- College of Basic Medical Sciences, Dali University, Dali, Yunnan, China
| |
Collapse
|
30
|
Zhang Q, Tian B. The emerging theme of 3'UTR mRNA isoform regulation in reprogramming of cell metabolism. Biochem Soc Trans 2023; 51:1111-1119. [PMID: 37171086 PMCID: PMC10771799 DOI: 10.1042/bst20221128] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 03/26/2023] [Accepted: 04/19/2023] [Indexed: 05/13/2023]
Abstract
The 3' untranslated region (3'UTR) of mRNA plays a key role in the post-transcriptional regulation of gene expression. Most eukaryotic protein-coding genes express 3'UTR isoforms owing to alternative cleavage and polyadenylation (APA). The 3'UTR isoform expression profile of a cell changes in cell proliferation, differentiation, and stress conditions. Here, we review the emerging theme of regulation of 3'UTR isoforms in cell metabolic reprogramming, focusing on cell growth and autophagy responses through the mTOR pathway. We discuss regulatory events that converge on the Cleavage Factor I complex, a master regulator of APA in 3'UTRs, and recent understandings of isoform-specific m6A modification and endomembrane association in determining differential metabolic fates of 3'UTR isoforms.
Collapse
Affiliation(s)
- Qiang Zhang
- Gene Expression and Regulation Program and Center for Systems and Computational Biology, The Wistar Institute, Philadelphia, PA 19104, U.S.A
| | - Bin Tian
- Gene Expression and Regulation Program and Center for Systems and Computational Biology, The Wistar Institute, Philadelphia, PA 19104, U.S.A
| |
Collapse
|
31
|
Panzeri V, Pieraccioli M, Cesari E, de la Grange P, Sette C. CDK12/13 promote splicing of proximal introns by enhancing the interaction between RNA polymerase II and the splicing factor SF3B1. Nucleic Acids Res 2023; 51:5512-5526. [PMID: 37026485 PMCID: PMC10287901 DOI: 10.1093/nar/gkad258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 02/17/2023] [Accepted: 03/28/2023] [Indexed: 04/08/2023] Open
Abstract
Transcription-associated cyclin-dependent kinases (CDKs) regulate the transcription cycle through sequential phosphorylation of RNA polymerase II (RNAPII). Herein, we report that dual inhibition of the highly homologous CDK12 and CDK13 impairs splicing of a subset of promoter-proximal introns characterized by weak 3' splice sites located at larger distance from the branchpoint. Nascent transcript analysis indicated that these introns are selectively retained upon pharmacological inhibition of CDK12/13 with respect to downstream introns of the same pre-mRNAs. Retention of these introns was also triggered by pladienolide B (PdB), an inhibitor of the U2 small nucelar ribonucleoprotein (snRNP) factor SF3B1 that recognizes the branchpoint. CDK12/13 activity promotes the interaction of SF3B1 with RNAPII phosphorylated on Ser2, and disruption of this interaction by treatment with the CDK12/13 inhibitor THZ531 impairs the association of SF3B1 with chromatin and its recruitment to the 3' splice site of these introns. Furthermore, by using suboptimal doses of THZ531 and PdB, we describe a synergic effect of these inhibitors on intron retention, cell cycle progression and cancer cell survival. These findings uncover a mechanism by which CDK12/13 couple RNA transcription and processing, and suggest that combined inhibition of these kinases and the spliceosome represents an exploitable anticancer approach.
Collapse
Affiliation(s)
- Valentina Panzeri
- Department of Neuroscience, Section of Human Anatomy, Catholic University of the Sacred Heart, Rome, Italy
| | - Marco Pieraccioli
- Department of Neuroscience, Section of Human Anatomy, Catholic University of the Sacred Heart, Rome, Italy
- Gemelli Science and Technology Park (GSTeP)-Organoids Research Core Facility, Fondazione Policlinico Agostino Gemelli IRCCS, Rome, Italy
| | - Eleonora Cesari
- Department of Neuroscience, Section of Human Anatomy, Catholic University of the Sacred Heart, Rome, Italy
- Gemelli Science and Technology Park (GSTeP)-Organoids Research Core Facility, Fondazione Policlinico Agostino Gemelli IRCCS, Rome, Italy
| | | | - Claudio Sette
- Department of Neuroscience, Section of Human Anatomy, Catholic University of the Sacred Heart, Rome, Italy
- Gemelli Science and Technology Park (GSTeP)-Organoids Research Core Facility, Fondazione Policlinico Agostino Gemelli IRCCS, Rome, Italy
| |
Collapse
|
32
|
Valeri JA, Soenksen LR, Collins KM, Ramesh P, Cai G, Powers R, Angenent-Mari NM, Camacho DM, Wong F, Lu TK, Collins JJ. BioAutoMATED: An end-to-end automated machine learning tool for explanation and design of biological sequences. Cell Syst 2023; 14:525-542.e9. [PMID: 37348466 PMCID: PMC10700034 DOI: 10.1016/j.cels.2023.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 02/17/2023] [Accepted: 05/22/2023] [Indexed: 06/24/2023]
Abstract
The design choices underlying machine-learning (ML) models present important barriers to entry for many biologists who aim to incorporate ML in their research. Automated machine-learning (AutoML) algorithms can address many challenges that come with applying ML to the life sciences. However, these algorithms are rarely used in systems and synthetic biology studies because they typically do not explicitly handle biological sequences (e.g., nucleotide, amino acid, or glycan sequences) and cannot be easily compared with other AutoML algorithms. Here, we present BioAutoMATED, an AutoML platform for biological sequence analysis that integrates multiple AutoML methods into a unified framework. Users are automatically provided with relevant techniques for analyzing, interpreting, and designing biological sequences. BioAutoMATED predicts gene regulation, peptide-drug interactions, and glycan annotation, and designs optimized synthetic biology components, revealing salient sequence characteristics. By automating sequence modeling, BioAutoMATED allows life scientists to incorporate ML more readily into their work.
Collapse
Affiliation(s)
- Jacqueline A Valeri
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Luis R Soenksen
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA
| | - Katherine M Collins
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Department of Engineering, University of Cambridge, Trumpington St, Cambridge CB2 1PZ, UK
| | - Pradeep Ramesh
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - George Cai
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Rani Powers
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Pluto Biosciences, Golden, CO 80402, USA
| | - Nicolaas M Angenent-Mari
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Diogo M Camacho
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Felix Wong
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Timothy K Lu
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Synthetic Biology Group, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - James J Collins
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA 02139, USA; Abdul Latif Jameel Clinic for Machine Learning in Health, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
33
|
Cao J, Kuyumcu-Martinez MN. Alternative polyadenylation regulation in cardiac development and cardiovascular disease. Cardiovasc Res 2023; 119:1324-1335. [PMID: 36657944 PMCID: PMC10262186 DOI: 10.1093/cvr/cvad014] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 11/01/2022] [Accepted: 11/28/2022] [Indexed: 01/21/2023] Open
Abstract
Cleavage and polyadenylation of pre-mRNAs is a necessary step for gene expression and function. Majority of human genes exhibit multiple polyadenylation sites, which can be alternatively used to generate different mRNA isoforms from a single gene. Alternative polyadenylation (APA) of pre-mRNAs is important for the proteome and transcriptome landscape. APA is tightly regulated during development and contributes to tissue-specific gene regulation. Mis-regulation of APA is linked to a wide range of pathological conditions. APA-mediated gene regulation in the heart is emerging as a new area of research. Here, we will discuss the impact of APA on gene regulation during heart development and in cardiovascular diseases. First, we will briefly review how APA impacts gene regulation and discuss molecular mechanisms that control APA. Then, we will address APA regulation during heart development and its dysregulation in cardiovascular diseases. Finally, we will discuss pre-mRNA targeting strategies to correct aberrant APA patterns of essential genes for the treatment or prevention of cardiovascular diseases. The RNA field is blooming due to advancements in RNA-based technologies. RNA-based vaccines and therapies are becoming the new line of effective and safe approaches for the treatment and prevention of human diseases. Overall, this review will be influential for understanding gene regulation at the RNA level via APA in the heart and will help design RNA-based tools for the treatment of cardiovascular diseases in the future.
Collapse
Affiliation(s)
- Jun Cao
- Faculty of Environment and Life, Beijing University of Technology, Xueyuan Road, Haidian District, Beijing 100124, PR China
| | - Muge N Kuyumcu-Martinez
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd, Galveston, TX 77573, USA
- Department of Neurobiology, University of Texas Medical Branch, Galveston, TX 77555, USA
- Institute for Translational Sciences, University of Texas Medical Branch, 301 University Blvd, Galveston, TX 77573, USA
| |
Collapse
|
34
|
Fabo T, Khavari P. Functional characterization of human genomic variation linked to polygenic diseases. Trends Genet 2023; 39:462-490. [PMID: 36997428 PMCID: PMC11025698 DOI: 10.1016/j.tig.2023.02.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 03/30/2023]
Abstract
The burden of human disease lies predominantly in polygenic diseases. Since the early 2000s, genome-wide association studies (GWAS) have identified genetic variants and loci associated with complex traits. These have ranged from variants in coding sequences to mutations in regulatory regions, such as promoters and enhancers, as well as mutations affecting mediators of mRNA stability and other downstream regulators, such as 5' and 3'-untranslated regions (UTRs), long noncoding RNA (lncRNA), and miRNA. Recent research advances in genetics have utilized a combination of computational techniques, high-throughput in vitro and in vivo screening modalities, and precise genome editing to impute the function of diverse classes of genetic variants identified through GWAS. In this review, we highlight the vastness of genomic variants associated with polygenic disease risk and address recent advances in how genetic tools can be used to functionally characterize them.
Collapse
Affiliation(s)
- Tania Fabo
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Graduate Program in Genetics, Stanford University, Stanford, CA, USA; Stanford University School of Medicine, Stanford University, Stanford, CA, USA
| | - Paul Khavari
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Graduate Program in Genetics, Stanford University, Stanford, CA, USA; Stanford University School of Medicine, Stanford University, Stanford, CA, USA; Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA.
| |
Collapse
|
35
|
Alfonso-Gonzalez C, Legnini I, Holec S, Arrigoni L, Ozbulut HC, Mateos F, Koppstein D, Rybak-Wolf A, Bönisch U, Rajewsky N, Hilgers V. Sites of transcription initiation drive mRNA isoform selection. Cell 2023; 186:2438-2455.e22. [PMID: 37178687 PMCID: PMC10228280 DOI: 10.1016/j.cell.2023.04.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 12/16/2022] [Accepted: 04/06/2023] [Indexed: 05/15/2023]
Abstract
The generation of distinct messenger RNA isoforms through alternative RNA processing modulates the expression and function of genes, often in a cell-type-specific manner. Here, we assess the regulatory relationships between transcription initiation, alternative splicing, and 3' end site selection. Applying long-read sequencing to accurately represent even the longest transcripts from end to end, we quantify mRNA isoforms in Drosophila tissues, including the transcriptionally complex nervous system. We find that in Drosophila heads, as well as in human cerebral organoids, 3' end site choice is globally influenced by the site of transcription initiation (TSS). "Dominant promoters," characterized by specific epigenetic signatures including p300/CBP binding, impose a transcriptional constraint to define splice and polyadenylation variants. In vivo deletion or overexpression of dominant promoters as well as p300/CBP loss disrupted the 3' end expression landscape. Our study demonstrates the crucial impact of TSS choice on the regulation of transcript diversity and tissue identity.
Collapse
Affiliation(s)
- Carlos Alfonso-Gonzalez
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany; Faculty of Biology, Albert Ludwig University, 79104 Freiburg, Germany; International Max Planck Research School for Molecular and Cellular Biology (IMPRS-MCB), 79108 Freiburg, Germany
| | - Ivano Legnini
- Laboratory for Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany
| | - Sarah Holec
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Laura Arrigoni
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Hasan Can Ozbulut
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany; Faculty of Biology, Albert Ludwig University, 79104 Freiburg, Germany
| | - Fernando Mateos
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - David Koppstein
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Agnieszka Rybak-Wolf
- Organoid Platform, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany
| | - Ulrike Bönisch
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Nikolaus Rajewsky
- Laboratory for Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 10115 Berlin, Germany; Charité - Universitätsmedizin, Charitépl. 1, 10117 Berlin, Germany; German Center for Cardiovascular Research (DZHK), Site Berlin, Berlin, Germany; NeuroCure Cluster of Excellence, Berlin, Germany; German Cancer Consortium (DKTK); National Center for Tumor Diseases (NCT), Site Berlin, Berlin, Germany
| | - Valérie Hilgers
- Max-Planck-Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany; Signalling Research Centre CIBSS, University of Freiburg, Schänzlestraße 18, 79104 Freiburg, Germany.
| |
Collapse
|
36
|
Liu L, Yu AM, Wang X, Soles LV, Chen Y, Yoon Y, Sarkan KSK, Valdez MC, Linder J, Marazzi I, Yu Z, Qiao F, Li W, Seelig G, Shi Y. The anti-cancer compound JTE-607 reveals hidden sequence specificity of the mRNA 3' processing machinery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.11.536453. [PMID: 37090613 PMCID: PMC10120630 DOI: 10.1101/2023.04.11.536453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
JTE-607 is a small molecule compound with anti-inflammation and anti-cancer activities. Upon entering the cell, it is hydrolyzed to Compound 2, which directly binds to and inhibits CPSF73, the endonuclease for the cleavage step in pre-mRNA 3' processing. Although CPSF73 is universally required for mRNA 3' end formation, we have unexpectedly found that Compound 2- mediated inhibition of pre-mRNA 3' processing is sequence-specific and that the sequences flanking the cleavage site (CS) are a major determinant for drug sensitivity. By using massively parallel in vitro assays, we have measured the Compound 2 sensitivities of over 260,000 sequence variants and identified key sequence features that determine drug sensitivity. A machine learning model trained on these data can predict the impact of JTE-607 on poly(A) site (PAS) selection and transcription termination genome-wide. We propose a biochemical model in which CPSF73 and other mRNA 3' processing factors bind to RNA of the CS region in a sequence-specific manner and the affinity of such interaction determines the Compound 2 sensitivity of a PAS. As the Compound 2-resistant CS sequences, characterized by U/A-rich motifs, are prevalent in PASs from yeast to human, the CS region sequence may have more fundamental functions beyond determining drug resistance. Together, our study not only characterized the mechanism of action of a compound with clinical implications, but also revealed a previously unknown and evolutionarily conserved sequence-specificity of the mRNA 3' processing machinery.
Collapse
|
37
|
Karollus A, Mauermeier T, Gagneur J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol 2023; 24:56. [PMID: 36973806 PMCID: PMC10045630 DOI: 10.1186/s13059-023-02899-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 03/16/2023] [Indexed: 03/29/2023] Open
Abstract
BACKGROUND The largest sequence-based models of transcription control to date are obtained by predicting genome-wide gene regulatory assays across the human genome. This setting is fundamentally correlative, as those models are exposed during training solely to the sequence variation between human genes that arose through evolution, questioning the extent to which those models capture genuine causal signals. RESULTS Here we confront predictions of state-of-the-art models of transcription regulation against data from two large-scale observational studies and five deep perturbation assays. The most advanced of these sequence-based models, Enformer, by and large, captures causal determinants of human promoters. However, models fail to capture the causal effects of enhancers on expression, notably in medium to long distances and particularly for highly expressed promoters. More generally, the predicted impact of distal elements on gene expression predictions is small and the ability to correctly integrate long-range information is significantly more limited than the receptive fields of the models suggest. This is likely caused by the escalating class imbalance between actual and candidate regulatory elements as distance increases. CONCLUSIONS Our results suggest that sequence-based models have advanced to the point that in silico study of promoter regions and promoter variants can provide meaningful insights and we provide practical guidance on how to use them. Moreover, we foresee that it will require significantly more and particularly new kinds of data to train models accurately accounting for distal elements.
Collapse
Affiliation(s)
- Alexander Karollus
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Thomas Mauermeier
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
38
|
O'Connell RW, Rai K, Piepergerdes TC, Samra KD, Wilson JA, Lin S, Zhang TH, Ramos EM, Sun A, Kille B, Curry KD, Rocks JW, Treangen TJ, Mehta P, Bashor CJ. Ultra-high throughput mapping of genetic design space. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.532704. [PMID: 36993481 PMCID: PMC10055055 DOI: 10.1101/2023.03.16.532704] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Massively parallel genetic screens have been used to map sequence-to-function relationships for a variety of genetic elements. However, because these approaches only interrogate short sequences, it remains challenging to perform high throughput (HT) assays on constructs containing combinations of sequence elements arranged across multi-kb length scales. Overcoming this barrier could accelerate synthetic biology; by screening diverse gene circuit designs, "composition-to-function" mappings could be created that reveal genetic part composability rules and enable rapid identification of behavior-optimized variants. Here, we introduce CLASSIC, a generalizable genetic screening platform that combines long- and short-read next-generation sequencing (NGS) modalities to quantitatively assess pooled libraries of DNA constructs of arbitrary length. We show that CLASSIC can measure expression profiles of >10 5 drug-inducible gene circuit designs (ranging from 6-9 kb) in a single experiment in human cells. Using statistical inference and machine learning (ML) approaches, we demonstrate that data obtained with CLASSIC enables predictive modeling of an entire circuit design landscape, offering critical insight into underlying design principles. Our work shows that by expanding the throughput and understanding gained with each design-build-test-learn (DBTL) cycle, CLASSIC dramatically augments the pace and scale of synthetic biology and establishes an experimental basis for data-driven design of complex genetic systems.
Collapse
|
39
|
Kim DE, Jensen DR, Feldman D, Tischer D, Saleem A, Chow CM, Li X, Carter L, Milles L, Nguyen H, Kang A, Bera AK, Peterson FC, Volkman BF, Ovchinnikov S, Baker D. De novo design of small beta barrel proteins. Proc Natl Acad Sci U S A 2023; 120:e2207974120. [PMID: 36897987 PMCID: PMC10089152 DOI: 10.1073/pnas.2207974120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 01/27/2023] [Indexed: 03/12/2023] Open
Abstract
Small beta barrel proteins are attractive targets for computational design because of their considerable functional diversity despite their very small size (<70 amino acids). However, there are considerable challenges to designing such structures, and there has been little success thus far. Because of the small size, the hydrophobic core stabilizing the fold is necessarily very small, and the conformational strain of barrel closure can oppose folding; also intermolecular aggregation through free beta strand edges can compete with proper monomer folding. Here, we explore the de novo design of small beta barrel topologies using both Rosetta energy-based methods and deep learning approaches to design four small beta barrel folds: Src homology 3 (SH3) and oligonucleotide/oligosaccharide-binding (OB) topologies found in nature and five and six up-and-down-stranded barrels rarely if ever seen in nature. Both approaches yielded successful designs with high thermal stability and experimentally determined structures with less than 2.4 Å rmsd from the designed models. Using deep learning for backbone generation and Rosetta for sequence design yielded higher design success rates and increased structural diversity than Rosetta alone. The ability to design a large and structurally diverse set of small beta barrel proteins greatly increases the protein shape space available for designing binders to protein targets of interest.
Collapse
Affiliation(s)
- David E. Kim
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- HHMI, University of Washington, Seattle, WA98195
| | - Davin R. Jensen
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI53226
| | - David Feldman
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Doug Tischer
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Ayesha Saleem
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Cameron M. Chow
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Xinting Li
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Lauren Carter
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Lukas Milles
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Hannah Nguyen
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Alex Kang
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Asim K. Bera
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Francis C. Peterson
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI53226
| | - Brian F. Volkman
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI53226
| | - Sergey Ovchinnikov
- Division of Science, Faculty of Arts and Sciences, Harvard University, Cambridge, MA02138
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA02138
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- HHMI, University of Washington, Seattle, WA98195
| |
Collapse
|
40
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
41
|
Kowalski MH, Wessels HH, Linder J, Choudhary S, Hartman A, Hao Y, Mascio I, Dalgarno C, Kundaje A, Satija R. CPA-Perturb-seq: Multiplexed single-cell characterization of alternative polyadenylation regulators. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.09.527751. [PMID: 36798324 PMCID: PMC9934614 DOI: 10.1101/2023.02.09.527751] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Most mammalian genes have multiple polyA sites, representing a substantial source of transcript diversity that is governed by the cleavage and polyadenylation (CPA) regulatory machinery. To better understand how these proteins govern polyA site choice we introduce CPA-Perturb-seq, a multiplexed perturbation screen dataset of 42 known CPA regulators with a 3' scRNA-seq readout that enables transcriptome-wide inference of polyA site usage. We develop a statistical framework to specifically identify perturbation-dependent changes in intronic and tandem polyadenylation, and discover modules of co-regulated polyA sites exhibiting distinct functional properties. By training a multi-task deep neural network (APARENT-Perturb) on our dataset, we delineate a cis-regulatory code that predicts responsiveness to perturbation and reveals interactions between distinct regulatory complexes. Finally, we leverage our framework to re-analyze published scRNA-seq datasets, identifying new regulators that affect the relative abundance of alternatively polyadenylated transcripts, and characterizing extensive cellular heterogeneity in 3' UTR length amongst antibody-producing cells. Our work highlights the potential for multiplexed single-cell perturbation screens to further our understanding of post-transcriptional regulation in vitro and in vivo.
Collapse
Affiliation(s)
- Madeline H. Kowalski
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York University Grossman School of Medicine, New York, NY, USA
| | - Hans-Hermann Wessels
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Johannes Linder
- Department of Genetics, Stanford University, Stanford USA
- Department of Computer Science, Stanford University, Stanford USA
| | - Saket Choudhary
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | | | - Yuhan Hao
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Isabella Mascio
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | | | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford USA
- Department of Computer Science, Stanford University, Stanford USA
| | - Rahul Satija
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York University Grossman School of Medicine, New York, NY, USA
| |
Collapse
|
42
|
Cui Y, Arnold FJ, Peng F, Wang D, Li JS, Michels S, Wagner EJ, La Spada AR, Li W. Alternative polyadenylation transcriptome-wide association study identifies APA-linked susceptibility genes in brain disorders. Nat Commun 2023; 14:583. [PMID: 36737438 PMCID: PMC9898543 DOI: 10.1038/s41467-023-36311-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 01/25/2023] [Indexed: 02/05/2023] Open
Abstract
Alternative polyadenylation (APA) plays an essential role in brain development; however, current transcriptome-wide association studies (TWAS) largely overlook APA in nominating susceptibility genes. Here, we performed a 3' untranslated region (3'UTR) APA TWAS (3'aTWAS) for 11 brain disorders by combining their genome-wide association studies data with 17,300 RNA-seq samples across 2,937 individuals. We identified 354 3'aTWAS-significant genes, including known APA-linked risk genes, such as SNCA in Parkinson's disease. Among these 354 genes, ~57% are not significant in traditional expression- and splicing-TWAS studies, since APA may regulate the translation, localization and protein-protein interaction of the target genes independent of mRNA level expression or splicing. Furthermore, we discovered ATXN3 as a 3'aTWAS-significant gene for amyotrophic lateral sclerosis, and its modulation substantially impacted pathological hallmarks of amyotrophic lateral sclerosis in vitro. Together, 3'aTWAS is a powerful strategy to nominate important APA-linked brain disorder susceptibility genes, most of which are largely overlooked by conventional expression and splicing analyses.
Collapse
Affiliation(s)
- Ya Cui
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, 92697, USA
| | - Frederick J Arnold
- Departments of Pathology & Laboratory Medicine, Neurology, and Biological Chemistry, School of Medicine, and the UCI Institute for Neurotherapeutics, University of California Irvine, Irvine, CA, 92697, USA
| | - Fanglue Peng
- Department of Molecular and Cellular Biology, University Baylor College of Medicine, Houston, TX, 77030, USA
| | - Dan Wang
- Department of Medicine, Division of Cardiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Jason Sheng Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, 92697, USA
| | - Sebastian Michels
- Departments of Pathology & Laboratory Medicine, Neurology, and Biological Chemistry, School of Medicine, and the UCI Institute for Neurotherapeutics, University of California Irvine, Irvine, CA, 92697, USA
| | - Eric J Wagner
- School of Medicine and Dentistry, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Albert R La Spada
- Departments of Pathology & Laboratory Medicine, Neurology, and Biological Chemistry, School of Medicine, and the UCI Institute for Neurotherapeutics, University of California Irvine, Irvine, CA, 92697, USA.
| | - Wei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
43
|
Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 2023; 24:125-137. [PMID: 36192604 DOI: 10.1038/s41576-022-00532-2] [Citation(s) in RCA: 63] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/31/2022] [Indexed: 01/24/2023]
Abstract
Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Nick Dexter
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.,School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. .,Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| |
Collapse
|
44
|
Li Z, Gao E, Zhou J, Han W, Xu X, Gao X. Applications of deep learning in understanding gene regulation. CELL REPORTS METHODS 2023; 3:100384. [PMID: 36814848 PMCID: PMC9939384 DOI: 10.1016/j.crmeth.2022.100384] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Gene regulation is a central topic in cell biology. Advances in omics technologies and the accumulation of omics data have provided better opportunities for gene regulation studies than ever before. For this reason deep learning, as a data-driven predictive modeling approach, has been successfully applied to this field during the past decade. In this article, we aim to give a brief yet comprehensive overview of representative deep-learning methods for gene regulation. Specifically, we discuss and compare the design principles and datasets used by each method, creating a reference for researchers who wish to replicate or improve existing methods. We also discuss the common problems of existing approaches and prospectively introduce the emerging deep-learning paradigms that will potentially alleviate them. We hope that this article will provide a rich and up-to-date resource and shed light on future research directions in this area.
Collapse
Affiliation(s)
- Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Elva Gao
- The KAUST School, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
45
|
Mitschka S, Mayr C. Context-specific regulation and function of mRNA alternative polyadenylation. Nat Rev Mol Cell Biol 2022; 23:779-796. [PMID: 35798852 PMCID: PMC9261900 DOI: 10.1038/s41580-022-00507-5] [Citation(s) in RCA: 84] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2022] [Indexed: 02/08/2023]
Abstract
Alternative cleavage and polyadenylation (APA) is a widespread mechanism to generate mRNA isoforms with alternative 3' untranslated regions (UTRs). The expression of alternative 3' UTR isoforms is highly cell type specific and is further controlled in a gene-specific manner by environmental cues. In this Review, we discuss how the dynamic, fine-grained regulation of APA is accomplished by several mechanisms, including cis-regulatory elements in RNA and DNA and factors that control transcription, pre-mRNA cleavage and post-transcriptional processes. Furthermore, signalling pathways modulate the activity of these factors and integrate APA into gene regulatory programmes. Dysregulation of APA can reprogramme the outcome of signalling pathways and thus can control cellular responses to environmental changes. In addition to the regulation of protein abundance, APA has emerged as a major regulator of mRNA localization and the spatial organization of protein synthesis. This role enables the regulation of protein function through the addition of post-translational modifications or the formation of protein-protein interactions. We further discuss recent transformative advances in single-cell RNA sequencing and CRISPR-Cas technologies, which enable the mapping and functional characterization of alternative 3' UTRs in any biological context. Finally, we discuss new APA-based RNA therapeutics, including compounds that target APA in cancer and therapeutic genome editing of degenerative diseases.
Collapse
Affiliation(s)
- Sibylle Mitschka
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christine Mayr
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
46
|
Agarwal V, Kelley DR. The genetic and biochemical determinants of mRNA degradation rates in mammals. Genome Biol 2022; 23:245. [PMID: 36419176 PMCID: PMC9684954 DOI: 10.1186/s13059-022-02811-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 11/02/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Degradation rate is a fundamental aspect of mRNA metabolism, and the factors governing it remain poorly characterized. Understanding the genetic and biochemical determinants of mRNA half-life would enable more precise identification of variants that perturb gene expression through post-transcriptional gene regulatory mechanisms. RESULTS We establish a compendium of 39 human and 27 mouse transcriptome-wide mRNA decay rate datasets. A meta-analysis of these data identified a prevalence of technical noise and measurement bias, induced partially by the underlying experimental strategy. Correcting for these biases allowed us to derive more precise, consensus measurements of half-life which exhibit enhanced consistency between species. We trained substantially improved statistical models based upon genetic and biochemical features to better predict half-life and characterize the factors molding it. Our state-of-the-art model, Saluki, is a hybrid convolutional and recurrent deep neural network which relies only upon an mRNA sequence annotated with coding frame and splice sites to predict half-life (r=0.77). The key novel principle learned by Saluki is that the spatial positioning of splice sites, codons, and RNA-binding motifs within an mRNA is strongly associated with mRNA half-life. Saluki predicts the impact of RNA sequences and genetic mutations therein on mRNA stability, in agreement with functional measurements derived from massively parallel reporter assays. CONCLUSIONS Our work produces a more robust ground truth for transcriptome-wide mRNA half-lives in mammalian cells. Using these revised measurements, we trained Saluki, a model that is over 50% more accurate in predicting half-life from sequence than existing models. Saluki succinctly captures many of the known determinants of mRNA half-life and can be rapidly deployed to predict the functional consequences of arbitrary mutations in the transcriptome.
Collapse
Affiliation(s)
- Vikram Agarwal
- grid.497059.6Calico Life Sciences LLC, South San Francisco, CA 94080 USA ,grid.417555.70000 0000 8814 392XPresent Address: mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA 02451 USA
| | - David R. Kelley
- grid.497059.6Calico Life Sciences LLC, South San Francisco, CA 94080 USA
| |
Collapse
|
47
|
Linder J, Koplik SE, Kundaje A, Seelig G. Deciphering the impact of genetic variation on human polyadenylation using APARENT2. Genome Biol 2022; 23:232. [PMID: 36335397 PMCID: PMC9636789 DOI: 10.1186/s13059-022-02799-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 10/19/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND 3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging. RESULTS We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells. CONCLUSIONS A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.
Collapse
Affiliation(s)
| | | | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, USA
- Department of Computer Science, Stanford University, Stanford, USA
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
- Department of Electrical and Computer Engineering, University of Washington, Seattle, USA
| |
Collapse
|
48
|
Guo Y, Shen H, Li W, Li C, Jin C. Deep Effective k-mer representation learning for polyadenylation signal prediction via co-occurrence embedding. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
49
|
Nair S, Barrett A, Li D, Raney BJ, Lee BT, Kerpedjiev P, Ramalingam V, Pampari A, Lekschas F, Wang T, Haeussler M, Kundaje A. The dynseq browser track shows context-specific features at nucleotide resolution. Nat Genet 2022; 54:1581-1583. [PMID: 36241719 PMCID: PMC10015500 DOI: 10.1038/s41588-022-01194-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
High-throughput experimental platforms have revolutionized the ability to profile biochemical and functional properties of biological sequences such as DNA, RNA and proteins. By collating several data modalities with customizable tracks rendered using intuitive visualizations, genome browsers enable an interactive and interpretable exploration of diverse types of genome profiling experiments and derived annotations. However, existing genome browser tracks are not well suited for intuitive visualization of high-resolution DNA sequence features such as transcription factor motifs. Typically, motif instances in regulatory DNA sequences are visualized as BED-based annotation tracks, which highlight the genomic coordinates of the motif instances but do not expose their specific sequences. Instead, a genome sequence track needs to be cross-referenced with the BED track to identify sequences of motif hits. Even so, quantitative information about the motif instances such as affinity or conservation as well as differences in base resolution from the consensus motif are not immediately apparent. This makes interpretation slow and challenging. This problem is compounded when analyzing several cellular states and/or molecular readouts (such as ATAC-seq and ChIP–seq) simultaneously, as coordinates of enriched regions (peaks) and the set of active transcription factor motifs vary across cell states.
Collapse
Affiliation(s)
- Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Daofeng Li
- Department of Genetics, Washington University in St. Louis School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University in St. Louis School of Medicine, St. Louis, MO, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Ting Wang
- Department of Genetics, Washington University in St. Louis School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University in St. Louis School of Medicine, St. Louis, MO, USA
| | | | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
50
|
Zhu J, Zhang L, Liu J, Zhong S, Gao P, Shen J. Trichloroethylene remediation using zero-valent iron with kaolin clay, activated carbon and bacteria. WATER RESEARCH 2022; 226:119186. [PMID: 36244142 DOI: 10.1016/j.watres.2022.119186] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/22/2022] [Accepted: 09/29/2022] [Indexed: 06/16/2023]
Abstract
Nanoscale particles of zero-valent iron were used to form a permeable reactive barrier whose performance in dechlorinating a solution of trichloroethylene was compared with that of a barrier formed from limestone. The iron was combined with kaolin by calcination. The test liquid contained sewage sludge, and also added NH4Cl and KH2PO4. The average removal rates of trichloroethylene and phosphorus over 365 days both exceeded 94%. Chemical oxygen demand was reduced by 92% and ammonium nitrogen by 43.6%. All were significantly greater than the removals with the limestone barrier. The ceramsite barrier retained 85% of its effectiveness even after 365 days of use. Dechloromonas sp. was the main dechlorinating bacterium, but its removal ability is limited. The removal of trichloroethylene in such a barrier mainly depends on reduction by the zero-valent iron and biodegradation. The results show that the prepared ceramsite is stable and effective in removing trichloroethylene from water. It is a promising in-situ remediation material for groundwater.
Collapse
Affiliation(s)
- Jiayan Zhu
- School of Life and Environment Sciences, Guilin University of Electronic Technology, Guilin 541004, China; Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, China
| | - Lishan Zhang
- School of Life and Environment Sciences, Guilin University of Electronic Technology, Guilin 541004, China; Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, China.
| | - Junyong Liu
- School of Life and Environment Sciences, Guilin University of Electronic Technology, Guilin 541004, China; Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, China
| | - Shan Zhong
- School of Life and Environment Sciences, Guilin University of Electronic Technology, Guilin 541004, China; Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, China
| | - Pin Gao
- College of Environmental Science and Engineering, Donghua University, Shanghai 201620, China
| | - Jinyou Shen
- School of Chemical Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei Street, Nanjing, Jiangsu 210094, China
| |
Collapse
|