51
|
Agarwal V, Kelley DR. The genetic and biochemical determinants of mRNA degradation rates in mammals. Genome Biol 2022; 23:245. [PMID: 36419176 PMCID: PMC9684954 DOI: 10.1186/s13059-022-02811-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 11/02/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Degradation rate is a fundamental aspect of mRNA metabolism, and the factors governing it remain poorly characterized. Understanding the genetic and biochemical determinants of mRNA half-life would enable more precise identification of variants that perturb gene expression through post-transcriptional gene regulatory mechanisms. RESULTS We establish a compendium of 39 human and 27 mouse transcriptome-wide mRNA decay rate datasets. A meta-analysis of these data identified a prevalence of technical noise and measurement bias, induced partially by the underlying experimental strategy. Correcting for these biases allowed us to derive more precise, consensus measurements of half-life which exhibit enhanced consistency between species. We trained substantially improved statistical models based upon genetic and biochemical features to better predict half-life and characterize the factors molding it. Our state-of-the-art model, Saluki, is a hybrid convolutional and recurrent deep neural network which relies only upon an mRNA sequence annotated with coding frame and splice sites to predict half-life (r=0.77). The key novel principle learned by Saluki is that the spatial positioning of splice sites, codons, and RNA-binding motifs within an mRNA is strongly associated with mRNA half-life. Saluki predicts the impact of RNA sequences and genetic mutations therein on mRNA stability, in agreement with functional measurements derived from massively parallel reporter assays. CONCLUSIONS Our work produces a more robust ground truth for transcriptome-wide mRNA half-lives in mammalian cells. Using these revised measurements, we trained Saluki, a model that is over 50% more accurate in predicting half-life from sequence than existing models. Saluki succinctly captures many of the known determinants of mRNA half-life and can be rapidly deployed to predict the functional consequences of arbitrary mutations in the transcriptome.
Collapse
Affiliation(s)
- Vikram Agarwal
- Calico Life Sciences LLC, South San Francisco, CA, 94080, USA.
- Present Address: mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA, 02451, USA.
| | - David R Kelley
- Calico Life Sciences LLC, South San Francisco, CA, 94080, USA.
| |
Collapse
|
52
|
Linder J, Koplik SE, Kundaje A, Seelig G. Deciphering the impact of genetic variation on human polyadenylation using APARENT2. Genome Biol 2022; 23:232. [PMID: 36335397 PMCID: PMC9636789 DOI: 10.1186/s13059-022-02799-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 10/19/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND 3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging. RESULTS We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells. CONCLUSIONS A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.
Collapse
Affiliation(s)
| | | | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, USA
- Department of Computer Science, Stanford University, Stanford, USA
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
- Department of Electrical and Computer Engineering, University of Washington, Seattle, USA
| |
Collapse
|
53
|
Guo Y, Shen H, Li W, Li C, Jin C. Deep Effective k-mer representation learning for polyadenylation signal prediction via co-occurrence embedding. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
54
|
Nair S, Barrett A, Li D, Raney BJ, Lee BT, Kerpedjiev P, Ramalingam V, Pampari A, Lekschas F, Wang T, Haeussler M, Kundaje A. The dynseq browser track shows context-specific features at nucleotide resolution. Nat Genet 2022; 54:1581-1583. [PMID: 36241719 PMCID: PMC10015500 DOI: 10.1038/s41588-022-01194-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
High-throughput experimental platforms have revolutionized the ability to profile biochemical and functional properties of biological sequences such as DNA, RNA and proteins. By collating several data modalities with customizable tracks rendered using intuitive visualizations, genome browsers enable an interactive and interpretable exploration of diverse types of genome profiling experiments and derived annotations. However, existing genome browser tracks are not well suited for intuitive visualization of high-resolution DNA sequence features such as transcription factor motifs. Typically, motif instances in regulatory DNA sequences are visualized as BED-based annotation tracks, which highlight the genomic coordinates of the motif instances but do not expose their specific sequences. Instead, a genome sequence track needs to be cross-referenced with the BED track to identify sequences of motif hits. Even so, quantitative information about the motif instances such as affinity or conservation as well as differences in base resolution from the consensus motif are not immediately apparent. This makes interpretation slow and challenging. This problem is compounded when analyzing several cellular states and/or molecular readouts (such as ATAC-seq and ChIP–seq) simultaneously, as coordinates of enriched regions (peaks) and the set of active transcription factor motifs vary across cell states.
Collapse
Affiliation(s)
- Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Daofeng Li
- Department of Genetics, Washington University in St. Louis School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University in St. Louis School of Medicine, St. Louis, MO, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Ting Wang
- Department of Genetics, Washington University in St. Louis School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University in St. Louis School of Medicine, St. Louis, MO, USA
| | | | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
55
|
Zhu J, Zhang L, Liu J, Zhong S, Gao P, Shen J. Trichloroethylene remediation using zero-valent iron with kaolin clay, activated carbon and bacteria. WATER RESEARCH 2022; 226:119186. [PMID: 36244142 DOI: 10.1016/j.watres.2022.119186] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/22/2022] [Accepted: 09/29/2022] [Indexed: 06/16/2023]
Abstract
Nanoscale particles of zero-valent iron were used to form a permeable reactive barrier whose performance in dechlorinating a solution of trichloroethylene was compared with that of a barrier formed from limestone. The iron was combined with kaolin by calcination. The test liquid contained sewage sludge, and also added NH4Cl and KH2PO4. The average removal rates of trichloroethylene and phosphorus over 365 days both exceeded 94%. Chemical oxygen demand was reduced by 92% and ammonium nitrogen by 43.6%. All were significantly greater than the removals with the limestone barrier. The ceramsite barrier retained 85% of its effectiveness even after 365 days of use. Dechloromonas sp. was the main dechlorinating bacterium, but its removal ability is limited. The removal of trichloroethylene in such a barrier mainly depends on reduction by the zero-valent iron and biodegradation. The results show that the prepared ceramsite is stable and effective in removing trichloroethylene from water. It is a promising in-situ remediation material for groundwater.
Collapse
Affiliation(s)
- Jiayan Zhu
- School of Life and Environment Sciences, Guilin University of Electronic Technology, Guilin 541004, China; Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, China
| | - Lishan Zhang
- School of Life and Environment Sciences, Guilin University of Electronic Technology, Guilin 541004, China; Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, China.
| | - Junyong Liu
- School of Life and Environment Sciences, Guilin University of Electronic Technology, Guilin 541004, China; Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, China
| | - Shan Zhong
- School of Life and Environment Sciences, Guilin University of Electronic Technology, Guilin 541004, China; Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, China
| | - Pin Gao
- College of Environmental Science and Engineering, Donghua University, Shanghai 201620, China
| | - Jinyou Shen
- School of Chemical Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei Street, Nanjing, Jiangsu 210094, China
| |
Collapse
|
56
|
Mikl M, Eletto D, Nijim M, Lee M, Lafzi A, Mhamedi F, David O, Sain SB, Handler K, Moor A. A massively parallel reporter assay reveals focused and broadly encoded RNA localization signals in neurons. Nucleic Acids Res 2022; 50:10643-10664. [PMID: 36156153 PMCID: PMC9561380 DOI: 10.1093/nar/gkac806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 08/24/2022] [Accepted: 09/08/2022] [Indexed: 11/14/2022] Open
Abstract
Asymmetric subcellular mRNA localization allows spatial regulation of gene expression and functional compartmentalization. In neurons, localization of specific mRNAs to neurites is essential for cellular functioning. However, it is largely unknown how transcript sorting works in a sequence-specific manner. Here, we combined subcellular transcriptomics and massively parallel reporter assays and tested ∼50 000 sequences for their ability to localize to neurites. Mapping the localization potential of >300 genes revealed two ways neurite targeting can be achieved: focused localization motifs and broadly encoded localization potential. We characterized the interplay between RNA stability and localization and identified motifs able to bias localization towards neurite or soma as well as the trans-acting factors required for their action. Based on our data, we devised machine learning models that were able to predict the localization behavior of novel reporter sequences. Testing this predictor on native mRNA sequencing data showed good agreement between predicted and observed localization potential, suggesting that the rules uncovered by our MPRA also apply to the localization of native full-length transcripts.
Collapse
Affiliation(s)
- Martin Mikl
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Department of Human Biology, University of Haifa, Haifa, Israel
| | - Davide Eletto
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Malak Nijim
- Department of Human Biology, University of Haifa, Haifa, Israel
| | - Minkyoung Lee
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Atefeh Lafzi
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Farah Mhamedi
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Orit David
- Department of Human Biology, University of Haifa, Haifa, Israel
| | - Simona Baghai Sain
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Kristina Handler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Andreas E Moor
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| |
Collapse
|
57
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
58
|
Controlling gene expression with deep generative design of regulatory DNA. Nat Commun 2022; 13:5099. [PMID: 36042233 PMCID: PMC9427793 DOI: 10.1038/s41467-022-32818-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 08/18/2022] [Indexed: 11/25/2022] Open
Abstract
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue. Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Here the authors present EspressionGAN, a generative adversarial network that uses genomic and transcriptomic data to generate regulatory sequences.
Collapse
|
59
|
Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16:26. [PMID: 35879805 PMCID: PMC9317091 DOI: 10.1186/s40246-022-00396-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/12/2022] [Indexed: 12/02/2022] Open
Abstract
Genomics is advancing towards data-driven science. Through the advent of high-throughput data generating technologies in human genomics, we are overwhelmed with the heap of genomic data. To extract knowledge and pattern out of this genomic data, artificial intelligence especially deep learning methods has been instrumental. In the current review, we address development and application of deep learning methods/models in different subarea of human genomics. We assessed over- and under-charted area of genomics by deep learning techniques. Deep learning algorithms underlying the genomic tools have been discussed briefly in later part of this review. Finally, we discussed briefly about the late application of deep learning tools in genomic. Conclusively, this review is timely for biotechnology or genomic scientists in order to guide them why, when and how to use deep learning methods to analyse human genomic data.
Collapse
Affiliation(s)
- Wardah S Alharbi
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia
| | - Mamoon Rashid
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia.
| |
Collapse
|
60
|
Wang J, Lisanza S, Juergens D, Tischer D, Watson JL, Castro KM, Ragotte R, Saragovi A, Milles LF, Baek M, Anishchenko I, Yang W, Hicks DR, Expòsit M, Schlichthaerle T, Chun JH, Dauparas J, Bennett N, Wicky BIM, Muenks A, DiMaio F, Correia B, Ovchinnikov S, Baker D. Scaffolding protein functional sites using deep learning. Science 2022; 377:387-394. [PMID: 35862514 PMCID: PMC9621694 DOI: 10.1126/science.abn2100] [Citation(s) in RCA: 156] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, "constrained hallucination," optimizes sequences such that their predicted structures contain the desired functional site. The second approach, "inpainting," starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests.
Collapse
Affiliation(s)
- Jue Wang
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Sidney Lisanza
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
- Graduate program in Biological Physics, Structure and
Design, University of Washington, Seattle, WA 98105, USA
| | - David Juergens
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
- Molecular Engineering Graduate Program, University of
Washington, Seattle, WA 98105, USA
| | - Doug Tischer
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Joseph L. Watson
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Karla M. Castro
- Institute of Bioengineering, École Polytechnique
Fédérale de Lausanne, Lausanne CH-1015, Switzerland
| | - Robert Ragotte
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Amijai Saragovi
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Lukas F. Milles
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Minkyung Baek
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Wei Yang
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Derrick R. Hicks
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Marc Expòsit
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
- Molecular Engineering Graduate Program, University of
Washington, Seattle, WA 98105, USA
| | - Thomas Schlichthaerle
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Jung-Ho Chun
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
- Graduate program in Biological Physics, Structure and
Design, University of Washington, Seattle, WA 98105, USA
| | - Justas Dauparas
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Nathaniel Bennett
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
- Molecular Engineering Graduate Program, University of
Washington, Seattle, WA 98105, USA
| | - Basile I. M. Wicky
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Andrew Muenks
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Frank DiMaio
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
| | - Bruno Correia
- Institute of Bioengineering, École Polytechnique
Fédérale de Lausanne, Lausanne CH-1015, Switzerland
| | - Sergey Ovchinnikov
- FAS Division of Science, Harvard University, Cambridge, MA
02138, USA
- John Harvard Distinguished Science Fellowship Program,
Harvard University, Cambridge, MA 02138, USA
| | - David Baker
- Department of Biochemistry, University of Washington,
Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington,
Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington,
Seattle, WA 98105, USA
| |
Collapse
|
61
|
High-throughput techniques enable advances in the roles of DNA and RNA secondary structures in transcriptional and post-transcriptional gene regulation. Genome Biol 2022; 23:159. [PMID: 35851062 PMCID: PMC9290270 DOI: 10.1186/s13059-022-02727-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 07/07/2022] [Indexed: 12/27/2022] Open
Abstract
The most stable structure of DNA is the canonical right-handed double helix termed B DNA. However, certain environments and sequence motifs favor alternative conformations, termed non-canonical secondary structures. The roles of DNA and RNA secondary structures in transcriptional regulation remain incompletely understood. However, advances in high-throughput assays have enabled genome wide characterization of some secondary structures. Here, we describe their regulatory functions in promoters and 3’UTRs, providing insights into key mechanisms through which they regulate gene expression. We discuss their implication in human disease, and how advances in molecular technologies and emerging high-throughput experimental methods could provide additional insights.
Collapse
|
62
|
Wu M, Schmid M, Jensen T, Sandelin A. Computational identification of signals predictive for nuclear RNA exosome degradation pathway targeting. NAR Genom Bioinform 2022; 4:lqac071. [PMID: 36128426 PMCID: PMC9477074 DOI: 10.1093/nargab/lqac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 08/05/2022] [Accepted: 09/01/2022] [Indexed: 11/15/2022] Open
Abstract
The RNA exosome degrades transcripts in the nucleoplasm of mammalian cells. Its substrate specificity is mediated by two adaptors: the ‘nuclear exosome targeting (NEXT)’ complex and the ‘poly(A) exosome targeting (PAXT)’ connection. Previous studies have revealed some DNA/RNA elements that differ between the two pathways, but how informative these features are for distinguishing pathway targeting, or whether additional genomic features that are informative for such classifications exist, is unknown. Here, we leverage the wealth of available genomic data and develop machine learning models that predict exosome targets and subsequently rank the features the models use by their predictive power. As expected, features around transcript end sites were most predictive; specifically, the lack of canonical 3′ end processing was highly predictive of NEXT targets. Other associated features, such as promoter-proximal G/C content and 5′ splice sites, were informative, but only for distinguishing NEXT and not PAXT targets. Finally, we discovered predictive features not previously associated with exosome targeting, in particular RNA helicase DDX3X binding sites. Overall, our results demonstrate that nucleoplasmic exosome targeting is to a large degree predictable, and our approach can assess the predictive power of previously known and new features in an unbiased way.
Collapse
Affiliation(s)
- Mengjun Wu
- The Bioinformatics Centre, Department of Biology and Biotech and Research Innovation Centre, University of Copenhagen , Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
- SciLifeLab, Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet , 171 65 Solna , Sweden
| | - Manfred Schmid
- Department of Molecular Biology and Genetics, Aarhus University , Universitetsbyen 81, Aarhus , DK-8000, Denmark
| | - Torben Heick Jensen
- Department of Molecular Biology and Genetics, Aarhus University , Universitetsbyen 81, Aarhus , DK-8000, Denmark
| | - Albin Sandelin
- The Bioinformatics Centre, Department of Biology and Biotech and Research Innovation Centre, University of Copenhagen , Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| |
Collapse
|
63
|
Fang Z, Peltz G. An automated multi-modal graph-based pipeline for mouse genetic discovery. Bioinformatics 2022; 38:3385-3394. [PMID: 35608290 PMCID: PMC9992076 DOI: 10.1093/bioinformatics/btac356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 04/18/2022] [Accepted: 05/19/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Our ability to identify causative genetic factors for mouse genetic models of human diseases and biomedical traits has been limited by the difficulties associated with identifying true causative factors, which are often obscured by the many false positive genetic associations produced by a GWAS. RESULTS To accelerate the pace of genetic discovery, we developed a graph neural network (GNN)-based automated pipeline (GNNHap) that could rapidly analyze mouse genetic model data and identify high probability causal genetic factors for analyzed traits. After assessing the strength of allelic associations with the strain response pattern; this pipeline analyzes 29M published papers to assess candidate gene-phenotype relationships; and incorporates the information obtained from a protein-protein interaction network and protein sequence features into the analysis. The GNN model produces markedly improved results relative to that of a simple linear neural network. We demonstrate that GNNHap can identify novel causative genetic factors for murine models of diabetes/obesity and for cataract formation, which were validated by the phenotypes appearing in previously analyzed gene knockout mice. The diabetes/obesity results indicate how characterization of the underlying genetic architecture enables new therapies to be discovered and tested by applying 'precision medicine' principles to murine models. AVAILABILITY AND IMPLEMENTATION The GNNHap source code is freely available at https://github.com/zqfang/gnnhap, and the new version of the HBCGM program is available at https://github.com/zqfang/haplomap. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhuoqing Fang
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gary Peltz
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
64
|
Abstract
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
Collapse
|
65
|
Kwon B, Fansler MM, Patel ND, Lee J, Ma W, Mayr C. Enhancers regulate 3' end processing activity to control expression of alternative 3'UTR isoforms. Nat Commun 2022; 13:2709. [PMID: 35581194 PMCID: PMC9114392 DOI: 10.1038/s41467-022-30525-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 05/02/2022] [Indexed: 12/12/2022] Open
Abstract
Multi-UTR genes are widely transcribed and express their alternative 3'UTR isoforms in a cell type-specific manner. As transcriptional enhancers regulate mRNA expression, we investigated if they also regulate 3'UTR isoform expression. Endogenous enhancer deletion of the multi-UTR gene PTEN did not impair transcript production but prevented 3'UTR isoform switching which was recapitulated by silencing of an enhancer-bound transcription factor. In reporter assays, enhancers increase transcript production when paired with single-UTR gene promoters. However, when combined with multi-UTR gene promoters, they change 3'UTR isoform expression by increasing 3' end processing activity of polyadenylation sites. Processing activity of polyadenylation sites is affected by transcription factors, including NF-κB and MYC, transcription elongation factors, chromatin remodelers, and histone acetyltransferases. As endogenous cell type-specific enhancers are associated with genes that increase their short 3'UTRs in a cell type-specific manner, our data suggest that transcriptional enhancers integrate cellular signals to regulate cell type-and condition-specific 3'UTR isoform expression.
Collapse
Affiliation(s)
- Buki Kwon
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Mervin M Fansler
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Graduate College, New York, NY, 10021, USA
| | - Neil D Patel
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Jihye Lee
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Weirui Ma
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Christine Mayr
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Graduate College, New York, NY, 10021, USA.
| |
Collapse
|
66
|
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 2022; 54:613-624. [PMID: 35551305 DOI: 10.1038/s41588-022-01048-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023]
Abstract
Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood, and de novo enhancer design has been challenging. Here, we built a deep-learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally nonequivalent instances of the same TF motif that are determined by motif-flanking sequence and intermotif distances. We validated these rules experimentally and demonstrated that they can be generalized to humans by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo.
Collapse
|
67
|
Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes. Nat Commun 2022; 13:2270. [PMID: 35477703 PMCID: PMC9046390 DOI: 10.1038/s41467-022-30017-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
There is growing evidence for the importance of 3' untranslated region (3'UTR) dependent regulatory processes. However, our current human 3'UTR catalogue is incomplete. Here, we develop a machine learning-based framework, leveraging both genomic and tissue-specific transcriptomic features to predict previously unannotated 3'UTRs. We identify unannotated 3'UTRs associated with 1,563 genes across 39 human tissues, with the greatest abundance found in the brain. These unannotated 3'UTRs are significantly enriched for RNA binding protein (RBP) motifs and exhibit high human lineage-specificity. We find that brain-specific unannotated 3'UTRs are enriched for the binding motifs of important neuronal RBPs such as TARDBP and RBFOX1, and their associated genes are involved in synaptic function. Our data is shared through an online resource F3UTER ( https://astx.shinyapps.io/F3UTER/ ). Overall, our data improves 3'UTR annotation and provides additional insights into the mRNA-RBP interactome in the human brain, with implications for our understanding of neurological and neurodevelopmental diseases.
Collapse
|
68
|
Georgakopoulos-Soares I, Victorino J, Parada GE, Agarwal V, Zhao J, Wong HY, Umar MI, Elor O, Muhwezi A, An JY, Sanders SJ, Kwok CK, Inoue F, Hemberg M, Ahituv N. High-throughput characterization of the role of non-B DNA motifs on promoter function. CELL GENOMICS 2022; 2:100111. [PMID: 35573091 PMCID: PMC9105345 DOI: 10.1016/j.xgen.2022.100111] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 10/21/2021] [Accepted: 02/18/2022] [Indexed: 12/24/2022]
Abstract
lternative DNA conformations, termed non-B DNA structures, can affect transcription, but the underlying mechanisms and their functional impact have not been systematically characterized. Here, we used computational genomic analyses coupled with massively parallel reporter assays (MPRAs) to show that certain non-B DNA structures have a substantial effect on gene expression. Genomic analyses found that non-B DNA structures at promoters harbor an excess of germline variants. Analysis of multiple MPRAs, including a promoter library specifically designed to perturb non-B DNA structures, functionally validated that Z-DNA can significantly affect promoter activity. We also observed that biophysical properties of non-B DNA motifs, such as the length of Z-DNA motifs and the orientation of G-quadruplex structures relative to transcriptional direction, have a significant effect on promoter activity. Combined, their higher mutation rate and functional effect on transcription implicate a subset of non-B DNA motifs as major drivers of human gene-expression-associated phenotypes.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Jesus Victorino
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- Departamento de Bioquímica, Facultad de Medicina, Universidad Autónoma de Madrid (UAM), 28029 Madrid, Spain
| | - Guillermo E. Parada
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | | | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Hei Yuen Wong
- Department of Chemistry and State Key Laboratory of Marine Pollution, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Mubarak Ishaq Umar
- Department of Chemistry and State Key Laboratory of Marine Pollution, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Orry Elor
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Allan Muhwezi
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Joon-Yong An
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, Republic of Korea
| | - Stephan J. Sanders
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA
| | - Chun Kit Kwok
- Department of Chemistry and State Key Laboratory of Marine Pollution, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
- Shenzhen Research Institute of City University of Hong Kong, Shenzhen, China
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
69
|
Context-aware dynamic neural computational models for accurate Poly(A) signal prediction. Neural Netw 2022; 152:287-299. [DOI: 10.1016/j.neunet.2022.04.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/03/2022] [Accepted: 04/22/2022] [Indexed: 11/21/2022]
|
70
|
Singer JM, Novotney S, Strickland D, Haddox HK, Leiby N, Rocklin GJ, Chow CM, Roy A, Bera AK, Motta FC, Cao L, Strauch EM, Chidyausiku TM, Ford A, Ho E, Zaitzeff A, Mackenzie CO, Eramian H, DiMaio F, Grigoryan G, Vaughn M, Stewart LJ, Baker D, Klavins E. Large-scale design and refinement of stable proteins using sequence-only models. PLoS One 2022; 17:e0265020. [PMID: 35286324 PMCID: PMC8920274 DOI: 10.1371/journal.pone.0265020] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/18/2022] [Indexed: 12/25/2022] Open
Abstract
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.
Collapse
Affiliation(s)
| | - Scott Novotney
- Two Six Technologies, Arlington, Virginia, United States of America
| | - Devin Strickland
- Department of Electrical and Computer Engineering, University of Washington, Seattle, Washington, United States of America
| | - Hugh K. Haddox
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Nicholas Leiby
- Two Six Technologies, Arlington, Virginia, United States of America
| | - Gabriel J. Rocklin
- Department of Pharmacology and Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Cameron M. Chow
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Anindya Roy
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Asim K. Bera
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Francis C. Motta
- Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, Florida, United States of America
| | - Longxing Cao
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Eva-Maria Strauch
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, Georgia, United States of America
| | - Tamuka M. Chidyausiku
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Alex Ford
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Ethan Ho
- Texas Advanced Computing Center, Austin, Texas, United States of America
| | | | - Craig O. Mackenzie
- Quantitative Biomedical Sciences Graduate Program, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Hamed Eramian
- Netrias, Cambridge, Massachusetts, United States of America
| | - Frank DiMaio
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Gevorg Grigoryan
- Departments of Computer Science and Biological Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Matthew Vaughn
- Texas Advanced Computing Center, Austin, Texas, United States of America
| | - Lance J. Stewart
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, United States of America
| | - Eric Klavins
- Department of Electrical and Computer Engineering, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
71
|
Xi H, Li Z, Han J, Shen D, Li N, Long Y, Chen Z, Xu L, Zhang X, Niu D, Liu H. Evaluating the capability of municipal solid waste separation in China based on AHP-EWM and BP neural network. WASTE MANAGEMENT (NEW YORK, N.Y.) 2022; 139:208-216. [PMID: 34974315 DOI: 10.1016/j.wasman.2021.12.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 12/04/2021] [Accepted: 12/07/2021] [Indexed: 05/17/2023]
Abstract
With the increase in municipal solid waste (MSW), most cities face solid waste management issues. In this study, the analytic hierarchy process (AHP) and artificial neural network (ANN) models were improved to assess the MSW separation capability based on 18 selected indicators of solid waste separation in 15 cities in China. The entropy weight method (EWM) was used in AHP to optimize and determine the indicators and then evaluate their weights, which showed that the general public budget expenditure had the highest weight (0.5239). This implied that the MSW separation capability could be mainly influenced by government financial support. ANN based on scan optimization and machine learning methods were established (R2 = 0.9992) to predict the missing indicators. The mapping relationship between MSW separation indicators and capabilities was also significantly improved from R2 = 0.5317 to R2 = 0.9993, thereby increasing the prediction accuracy of MSW separation capabilities to 95.15%. Thus, this research provides a new avenue for MSW separation and establishes a combined model to predict the separation capability in practical applications.
Collapse
Affiliation(s)
- Hao Xi
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Zhiheng Li
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Jingyi Han
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Dongsheng Shen
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Na Li
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Yuyang Long
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Zhenlong Chen
- School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Linglin Xu
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Xianghong Zhang
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China
| | - Dongjie Niu
- College of Environmental Science and Engineering, Tongji University, Shanghai 200086, China
| | - Huijun Liu
- School of Environmental Science and Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China.
| |
Collapse
|
72
|
Interpreting neural networks for biological sequences by learning stochastic masks. NAT MACH INTELL 2022; 4:41-54. [DOI: 10.1038/s42256-021-00428-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
73
|
Castillo-Hair SM, Seelig G. Machine Learning for Designing Next-Generation mRNA Therapeutics. Acc Chem Res 2022; 55:24-34. [PMID: 34905691 DOI: 10.1021/acs.accounts.1c00621] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Over just the last 2 years, mRNA therapeutics and vaccines have undergone a rapid transition from an intriguing concept to real-world impact. However, whereas some aspects of mRNA therapeutics, such as the use of chemical modifications to increase stability and reduce immunogenicity, have been extensively optimized for over two decades, other aspects, particularly the selection and design of the noncoding leader and trailer sequences which control translation efficiency and stability, have received comparably less attention. In practice, such 5' and 3' untranslated regions (UTRs) are often borrowed from highly expressed human genes with few or no modifications, as in the case for the Pfizer/BioNTech Covid vaccine. Focusing on the 5'UTR, we here argue that model-driven design is a promising alternative that provides unprecedented control over 5'UTR function. We review recent work that combines synthetic biology with machine learning to build quantitative models that relate ribosome loading, and thus translation efficiency, to the 5'UTR sequence. We first introduce an experimental approach that uses polysome profiling and high-throughput sequencing to quantify ribosome loading for hundreds of thousands of 5'UTRs in parallel. We apply this approach to measure ribosome loading in synthetic RNA libraries with a random sequence inserted into the 5'UTR. We then review Optimus 5-Prime, a convolutional neural network model trained on the experimental data. We highlight that very accurate models of biological regulation can be learned from synthetic data sets with degenerate 5'UTRs. We validate model predictions not only on held-out data sets from our random library but also on a large library of over 30 000 human 5'UTR fragments and using translation reporter data collected independently by other groups. Both the experiment and model are compatible with commonly used chemically modified nucleosides, in particular, pseudouridine (Ψ) and 1-methyl-pseudouridine (m1Ψ). We find that, in general, 5'UTRs have very similar impacts when combined with different protein-coding sequences and even in the context of different chemical modifications. We demonstrate that Optimus 5-Prime can be combined with design algorithms to generate de novo sequences with precisely defined translation efficiencies. We emphasize recent developments in design algorithms that rely on activation maximization and generative modeling to improve both the fitness and diversity of designed sequences. Compared with prior approaches such as genetic algorithms, we show that these approaches are not only faster but also less likely to get stuck in local sequence optima. Finally, we discuss how the approach reviewed here can be generalized to other gene regions and applications.
Collapse
Affiliation(s)
- Sebastian M. Castillo-Hair
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, United States
- eScience Institute, University of Washington, Seattle, Washington 98195, United States
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
74
|
Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021; 24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel's concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Evolutionary, Computational, and Statistical Genetics, Department of Molecula Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postępu 36A, Jastrzębiec, 05-552 Warsaw, Poland
| |
Collapse
|
75
|
Linder J, Seelig G. Fast activation maximization for molecular sequence design. BMC Bioinformatics 2021; 22:510. [PMID: 34670493 PMCID: PMC8527647 DOI: 10.1186/s12859-021-04437-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 10/11/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Optimization of DNA and protein sequences based on Machine Learning models is becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation, which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, the current version of the method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence. RESULTS Here, we introduce Fast SeqProp, an improved activation maximization method that combines straight-through approximation with normalization across the parameters of the input sequence distribution. Fast SeqProp overcomes bottlenecks in earlier methods arising from input parameters becoming skewed during optimization. Compared to prior methods, Fast SeqProp results in up to 100-fold faster convergence while also finding improved fitness optima for many applications. We demonstrate Fast SeqProp's capabilities by designing DNA and protein sequences for six deep learning predictors, including a protein structure predictor. CONCLUSIONS Fast SeqProp offers a reliable and efficient method for general-purpose sequence optimization through a differentiable fitness predictor. As demonstrated on a variety of deep learning models, the method is widely applicable, and can incorporate various regularization techniques to maintain confidence in the sequence designs. As a design tool, Fast SeqProp may aid in the development of novel molecules, drug therapies and vaccines.
Collapse
Affiliation(s)
- Johannes Linder
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
- Department of Electrical and Computer Engineering, University of Washington, Seattle, USA
| |
Collapse
|
76
|
Savinov A, Brandsen BM, Angell BE, Cuperus JT, Fields S. Effects of sequence motifs in the yeast 3' untranslated region determined from massively parallel assays of random sequences. Genome Biol 2021; 22:293. [PMID: 34663436 PMCID: PMC8522215 DOI: 10.1186/s13059-021-02509-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 09/30/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The 3' untranslated region (UTR) plays critical roles in determining the level of gene expression through effects on activities such as mRNA stability and translation. Functional elements within this region have largely been identified through analyses of native genes, which contain multiple co-evolved sequence features. RESULTS To explore the effects of 3' UTR sequence elements outside of native sequence contexts, we analyze hundreds of thousands of random 50-mers inserted into the 3' UTR of a reporter gene in the yeast Saccharomyces cerevisiae. We determine relative protein expression levels from the fitness of transformants in a growth selection. We find that the consensus 3' UTR efficiency element significantly boosts expression, independent of sequence context; on the other hand, the consensus positioning element has only a small effect on expression. Some sequence motifs that are binding sites for Puf proteins substantially increase expression in the library, despite these proteins generally being associated with post-transcriptional downregulation of native mRNAs. Our measurements also allow a systematic examination of the effects of point mutations within efficiency element motifs across diverse sequence backgrounds. These mutational scans reveal the relative in vivo importance of individual bases in the efficiency element, which likely reflects their roles in binding the Hrp1 protein involved in cleavage and polyadenylation. CONCLUSIONS The regulatory effects of some 3' UTR sequence features, like the efficiency element, are consistent regardless of sequence context. In contrast, the consequences of other 3' UTR features appear to be strongly dependent on their evolved context within native genes.
Collapse
Affiliation(s)
- Andrew Savinov
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA
- Present address: Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Benjamin M Brandsen
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA
- Department of Chemistry and Biochemistry, Creighton University, Omaha, NE, 68178, USA
| | - Brooke E Angell
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA
- Present address: Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, IL, 60208, USA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA.
| | - Stanley Fields
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA.
- Department of Medicine, University of Washington, Box 357720, Seattle, WA, 98195, USA.
| |
Collapse
|
77
|
Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH, Luban J, Montgomery SB, Finucane HK, Novina CD, Tewhey R, Sabeti PC. Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell 2021; 184:5247-5260.e19. [PMID: 34534445 PMCID: PMC8487971 DOI: 10.1016/j.cell.2021.08.025] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/25/2021] [Accepted: 08/19/2021] [Indexed: 12/11/2022]
Abstract
3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
Collapse
Affiliation(s)
- Dustin Griesemer
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA.
| | - Steven K Reilly
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA
| | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kalki Kukreja
- Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA
| | - Joe R Davis
- BigHat Biosciences, San Carlos, CA 94070, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - David K Yang
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA
| | - John C Butts
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA
| | - Mehmet H Guney
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Jeremy Luban
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA; Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Carl D Novina
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA; Tufts University School of Medicine, Boston, MA 02111, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
78
|
Analysis of potential genetic biomarkers and molecular mechanism of smoking-related postmenopausal osteoporosis using weighted gene co-expression network analysis and machine learning. PLoS One 2021; 16:e0257343. [PMID: 34555052 PMCID: PMC8459994 DOI: 10.1371/journal.pone.0257343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 08/29/2021] [Indexed: 12/25/2022] Open
Abstract
OBJECTIVES Smoking is a significant independent risk factor for postmenopausal osteoporosis, leading to genome variations in postmenopausal smokers. This study investigates potential biomarkers and molecular mechanisms of smoking-related postmenopausal osteoporosis (SRPO). MATERIALS AND METHODS The GSE13850 microarray dataset was downloaded from Gene Expression Omnibus (GEO). Gene modules associated with SRPO were identified using weighted gene co-expression network analysis (WGCNA), protein-protein interaction (PPI) analysis, and pathway and functional enrichment analyses. Feature genes were selected using two machine learning methods: support vector machine-recursive feature elimination (SVM-RFE) and random forest (RF). The diagnostic efficiency of the selected genes was assessed by gene expression analysis and receiver operating characteristic curve. RESULTS Eight highly conserved modules were detected in the WGCNA network, and the genes in the module that was strongly correlated with SRPO were used for constructing the PPI network. A total of 113 hub genes were identified in the core network using topological network analysis. Enrichment analysis results showed that hub genes were closely associated with the regulation of RNA transcription and translation, ATPase activity, and immune-related signaling. Six genes (HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2) were selected as genetic biomarkers for SRPO by integrating the feature selection of SVM-RFE and RF. CONCLUSION The present study identified potential genetic biomarkers and provided a novel insight into the underlying molecular mechanism of SRPO.
Collapse
|
79
|
Li GW, Nan F, Yuan GH, Liu CX, Liu X, Chen LL, Tian B, Yang L. SCAPTURE: a deep learning-embedded pipeline that captures polyadenylation information from 3' tag-based RNA-seq of single cells. Genome Biol 2021; 22:221. [PMID: 34376223 PMCID: PMC8353616 DOI: 10.1186/s13059-021-02437-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 07/16/2021] [Indexed: 01/16/2023] Open
Abstract
Single-cell RNA-seq (scRNA-seq) profiles gene expression with high resolution. Here, we develop a stepwise computational method-called SCAPTURE to identify, evaluate, and quantify cleavage and polyadenylation sites (PASs) from 3' tag-based scRNA-seq. SCAPTURE detects PASs de novo in single cells with high sensitivity and accuracy, enabling detection of previously unannotated PASs. Quantified alternative PAS transcripts refine cell identity analysis beyond gene expression, enriching information extracted from scRNA-seq data. Using SCAPTURE, we show changes of PAS usage in PBMCs from infected versus healthy individuals at single-cell resolution.
Collapse
Affiliation(s)
- Guo-Wei Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Fang Nan
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Guo-Hua Yuan
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Chu-Xiao Liu
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
| | - Xindong Liu
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing, 400038, China
| | - Ling-Ling Chen
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
| | - Bin Tian
- Program in Gene Expression and Regulation, and Center for Systems and Computational Biology, The Wistar Institute, Philadelphia, PA, 19104, USA
| | - Li Yang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
80
|
Wang X, Dong Y, Zheng Y, Chen Y. Multiomics metabolic and epigenetics regulatory network in cancer: A systems biology perspective. J Genet Genomics 2021; 48:520-530. [PMID: 34362682 DOI: 10.1016/j.jgg.2021.05.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 05/07/2021] [Accepted: 05/11/2021] [Indexed: 12/21/2022]
Abstract
Genetic, epigenetic, and metabolic alterations are all hallmarks of cancer. However, the epigenome and metabolome are both highly complex and dynamic biological networks in vivo. The interplay between the epigenome and metabolome contributes to a biological system that is responsive to the tumor microenvironment and possesses a wealth of unknown biomarkers and targets of cancer therapy. From this perspective, we first review the state of high-throughput biological data acquisition (i.e. multiomics data) and analysis (i.e. computational tools) and then propose a conceptual in silico metabolic and epigenetic regulatory network (MER-Net) that is based on these current high-throughput methods. The conceptual MER-Net is aimed at linking metabolomic and epigenomic networks through observation of biological processes, omics data acquisition, analysis of network information, and integration with validated database knowledge. Thus, MER-Net could be used to reveal new potential biomarkers and therapeutic targets using deep learning models to integrate and analyze large multiomics networks. We propose that MER-Net can serve as a tool to guide integrated metabolomics and epigenomics research or can be modified to answer other complex biological and clinical questions using multiomics data.
Collapse
Affiliation(s)
- Xuezhu Wang
- The State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China
| | - Yucheng Dong
- The State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China
| | - Yongchang Zheng
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China
| | - Yang Chen
- The State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China.
| |
Collapse
|
81
|
Li VR, Zhang Z, Troyanskaya OG. CROTON: an automated and variant-aware deep learning framework for predicting CRISPR/Cas9 editing outcomes. Bioinformatics 2021; 37:i342-i348. [PMID: 34252931 PMCID: PMC8275342 DOI: 10.1093/bioinformatics/btab268] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 05/21/2021] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION CRISPR/Cas9 is a revolutionary gene-editing technology that has been widely utilized in biology, biotechnology and medicine. CRISPR/Cas9 editing outcomes depend on local DNA sequences at the target site and are thus predictable. However, existing prediction methods are dependent on both feature and model engineering, which restricts their performance to existing knowledge about CRISPR/Cas9 editing. RESULTS Herein, deep multi-task convolutional neural networks (CNNs) and neural architecture search (NAS) were used to automate both feature and model engineering and create an end-to-end deep-learning framework, CROTON (CRISPR Outcomes Through cONvolutional neural networks). The CROTON model architecture was tuned automatically with NAS on a synthetic large-scale construct-based dataset and then tested on an independent primary T cell genomic editing dataset. CROTON outperformed existing expert-designed models and non-NAS CNNs in predicting 1 base pair insertion and deletion probability as well as deletion and frameshift frequency. Interpretation of CROTON revealed local sequence determinants for diverse editing outcomes. Finally, CROTON was utilized to assess how single nucleotide variants (SNVs) affect the genome editing outcomes of four clinically relevant target genes: the viral receptors ACE2 and CCR5 and the immune checkpoint inhibitors CTLA4 and PDCD1. Large SNV-induced differences in CROTON predictions in these target genes suggest that SNVs should be taken into consideration when designing widely applicable gRNAs. AVAILABILITY AND IMPLEMENTATION https://github.com/vli31/CROTON. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Zijun Zhang
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY 10010, USA
| | - Olga G Troyanskaya
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY 10010, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
82
|
Li L, Huang KL, Gao Y, Cui Y, Wang G, Elrod ND, Li Y, Chen YE, Ji P, Peng F, Russell WK, Wagner EJ, Li W. An atlas of alternative polyadenylation quantitative trait loci contributing to complex trait and disease heritability. Nat Genet 2021; 53:994-1005. [PMID: 33986536 DOI: 10.1038/s41588-021-00864-5] [Citation(s) in RCA: 82] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 04/05/2021] [Indexed: 12/14/2022]
Abstract
Genome-wide association studies have identified thousands of noncoding variants associated with human traits and diseases. However, the functional interpretation of these variants is a major challenge. Here, we constructed a multi-tissue atlas of human 3'UTR alternative polyadenylation (APA) quantitative trait loci (3'aQTLs), containing approximately 0.4 million common genetic variants associated with the APA of target genes, identified in 46 tissues isolated from 467 individuals (Genotype-Tissue Expression Project). Mechanistically, 3'aQTLs can alter poly(A) motifs, RNA secondary structure and RNA-binding protein-binding sites, leading to thousands of APA changes. Our CRISPR-based experiments indicate that such 3'aQTLs can alter APA regulation. Furthermore, we demonstrate that mapping 3'aQTLs can identify APA regulators, such as La-related protein 4. Finally, 3'aQTLs are colocalized with approximately 16.1% of trait-associated variants and are largely distinct from other QTLs, such as expression QTLs. Together, our findings show that 3'aQTLs contribute substantially to the molecular mechanisms underlying human complex traits and diseases.
Collapse
Affiliation(s)
- Lei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Kai-Lieh Huang
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Yipeng Gao
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Ya Cui
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Gao Wang
- The Gertrude H. Sergievsky Center and Department of Neurology, Columbia University, New York, NY, USA
| | - Nathan D Elrod
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Yumei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Yiling Elaine Chen
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Ping Ji
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Fanglue Peng
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - William K Russell
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Eric J Wagner
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA.
| | - Wei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
83
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
84
|
Ye Z, Yang W, Yang Y, Ouyang D. Interpretable machine learning methods for in vitro pharmaceutical formulation development. FOOD FRONTIERS 2021. [DOI: 10.1002/fft2.78] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| | - Wenmian Yang
- State Key Laboratory of Internet of Things for Smart City University of Macau Macau China
| | - Yilong Yang
- School of Software Beihang University Beijing China
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| |
Collapse
|
85
|
Lalanne J, Parker DJ, Li G. Spurious regulatory connections dictate the expression-fitness landscape of translation factors. Mol Syst Biol 2021; 17:e10302. [PMID: 33900014 PMCID: PMC8073009 DOI: 10.15252/msb.202110302] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 12/21/2022] Open
Abstract
During steady-state cell growth, individual enzymatic fluxes can be directly inferred from growth rate by mass conservation, but the inverse problem remains unsolved. Perturbing the flux and expression of a single enzyme could have pleiotropic effects that may or may not dominate the impact on cell fitness. Here, we quantitatively dissect the molecular and global responses to varied expression of translation termination factors (peptide release factors, RFs) in the bacterium Bacillus subtilis. While endogenous RF expression maximizes proliferation, deviations in expression lead to unexpected distal regulatory responses that dictate fitness reduction. Molecularly, RF depletion causes expression imbalance at specific operons, which activates master regulators and detrimentally overrides the transcriptome. Through these spurious connections, RF abundances are thus entrenched by focal points within the regulatory network, in one case located at a single stop codon. Such regulatory entrenchment suggests that predictive bottom-up models of expression-fitness landscapes will require near-exhaustive characterization of parts.
Collapse
Affiliation(s)
- Jean‐Benoît Lalanne
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMAUSA
- Department of PhysicsMassachusetts Institute of TechnologyCambridgeMAUSA
- Present address:
Department of Genome SciencesUniversity of WashingtonSeattleWAUSA
| | - Darren J Parker
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMAUSA
- Present address:
Biosciences DivisionOak Ridge National LaboratoryOak RidgeTNUSA
| | - Gene‐Wei Li
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMAUSA
| |
Collapse
|
86
|
Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence. Nat Commun 2021; 12:1652. [PMID: 33712618 PMCID: PMC7955126 DOI: 10.1038/s41467-021-21894-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 02/18/2021] [Indexed: 02/01/2023] Open
Abstract
Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3'-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model-trained using the Human Brain Reference RNA commercial standard-performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi's input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.
Collapse
|
87
|
Wu G, Schmid M, Rib L, Polak P, Meola N, Sandelin A, Jensen TH. A Two-Layered Targeting Mechanism Underlies Nuclear RNA Sorting by the Human Exosome. Cell Rep 2021; 30:2387-2401.e5. [PMID: 32075771 DOI: 10.1016/j.celrep.2020.01.068] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 12/09/2019] [Accepted: 01/22/2020] [Indexed: 12/14/2022] Open
Abstract
Degradation of transcripts in human nuclei is primarily facilitated by the RNA exosome. To obtain substrate specificity, the exosome is aided by adaptors; in the nucleoplasm, those adaptors are the nuclear exosome-targeting (NEXT) complex and the poly(A) (pA) exosome-targeting (PAXT) connection. How these adaptors guide exosome targeting remains enigmatic. Employing high-resolution 3' end sequencing, we demonstrate that NEXT substrates arise from heterogenous and predominantly pA- 3' ends often covering kilobase-wide genomic regions. In contrast, PAXT targets harbor well-defined pA+ 3' ends defined by canonical pA site use. Irrespective of this clear division, NEXT and PAXT act redundantly in two ways: (1) regional redundancy, where the majority of exosome-targeted transcription units produce NEXT- and PAXT-sensitive RNA isoforms, and (2) isoform redundancy, where the PAXT connection ensures fail-safe decay of post-transcriptionally polyadenylated NEXT targets. In conjunction, this provides a two-layered targeting mechanism for efficient nuclear sorting of the human transcriptome.
Collapse
Affiliation(s)
- Guifen Wu
- Department of Molecular Biology and Genetics, Aarhus University, C.F. Møllers Allé 3, Building 1130, 8000 Aarhus C, Denmark
| | - Manfred Schmid
- Department of Molecular Biology and Genetics, Aarhus University, C.F. Møllers Allé 3, Building 1130, 8000 Aarhus C, Denmark
| | - Leonor Rib
- The Bioinformatics Centre, Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen, Denmark
| | - Patrik Polak
- Department of Molecular Biology and Genetics, Aarhus University, C.F. Møllers Allé 3, Building 1130, 8000 Aarhus C, Denmark
| | - Nicola Meola
- Department of Molecular Biology and Genetics, Aarhus University, C.F. Møllers Allé 3, Building 1130, 8000 Aarhus C, Denmark
| | - Albin Sandelin
- The Bioinformatics Centre, Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen, Denmark
| | - Torben Heick Jensen
- Department of Molecular Biology and Genetics, Aarhus University, C.F. Møllers Allé 3, Building 1130, 8000 Aarhus C, Denmark.
| |
Collapse
|
88
|
Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, Zeitlinger J. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021; 53:354-366. [PMID: 33603233 PMCID: PMC8812996 DOI: 10.1038/s41588-021-00782-6] [Citation(s) in RCA: 246] [Impact Index Per Article: 82.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 01/07/2021] [Indexed: 01/30/2023]
Abstract
The arrangement (syntax) of transcription factor (TF) binding motifs is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using clustered regularly interspaced short palindromic repeat (CRISPR)-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data.
Collapse
Affiliation(s)
- Žiga Avsec
- Department of Informatics, Technical University of Munich, Garching, Germany,Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Munich, Germany,Currently at DeepMind, London, UK
| | - Melanie Weilert
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Sabrina Krueger
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Amr Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Khyati Dalal
- Stowers Institute for Medical Research, Kansas City, MO, USA,The University of Kansas Medical Center, Kansas City, KS, USA
| | - Robin Fropf
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Charles McAnany
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA,Department of Genetics, Stanford University, Stanford, CA, USA,correspondence: ,
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO, USA,The University of Kansas Medical Center, Kansas City, KS, USA,correspondence: ,
| |
Collapse
|
89
|
Koo PK, Ploenzke M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. NAT MACH INTELL 2021; 3:258-266. [PMID: 34322657 DOI: 10.1038/s42256-020-00291-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Deep convolutional neural networks (CNNs) trained on regulatory genomic sequences tend to build representations in a distributed manner, making it a challenge to extract learned features that are biologically meaningful, such as sequence motifs. Here we perform a comprehensive analysis on synthetic sequences to investigate the role that CNN activations have on model interpretability. We show that employing an exponential activation to first layer filters consistently leads to interpretable and robust representations of motifs compared to other commonly used activations. Strikingly, we demonstrate that CNNs with better test performance do not necessarily imply more interpretable representations with attribution methods. We find that CNNs with exponential activations significantly improve the efficacy of recovering biologically meaningful representations with attribution methods. We demonstrate these results generalise to real DNA sequences across several in vivo datasets. Together, this work demonstrates how a small modification to existing CNNs, i.e. setting exponential activations in the first layer, can significantly improve the robustness and interpretabilty of learned representations directly in convolutional filters and indirectly with attribution methods.
Collapse
Affiliation(s)
- Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Matt Ploenzke
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| |
Collapse
|
90
|
Zhang Y, Liu L, Qiu Q, Zhou Q, Ding J, Lu Y, Liu P. Alternative polyadenylation: methods, mechanism, function, and role in cancer. J Exp Clin Cancer Res 2021; 40:51. [PMID: 33526057 PMCID: PMC7852185 DOI: 10.1186/s13046-021-01852-7] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 01/20/2021] [Indexed: 12/12/2022] Open
Abstract
Occurring in over 60% of human genes, alternative polyadenylation (APA) results in numerous transcripts with differing 3'ends, thus greatly expanding the diversity of mRNAs and of proteins derived from a single gene. As a key molecular mechanism, APA is involved in various gene regulation steps including mRNA maturation, mRNA stability, cellular RNA decay, and protein diversification. APA is frequently dysregulated in cancers leading to changes in oncogenes and tumor suppressor gene expressions. Recent studies have revealed various APA regulatory mechanisms that promote the development and progression of a number of human diseases, including cancer. Here, we provide an overview of four types of APA and their impacts on gene regulation. We focus particularly on the interaction of APA with microRNAs, RNA binding proteins and other related factors, the core pre-mRNA 3'end processing complex, and 3'UTR length change. We also describe next-generation sequencing methods and computational tools for use in poly(A) signal detection and APA repositories and databases. Finally, we summarize the current understanding of APA in cancer and provide our vision for future APA related research.
Collapse
Affiliation(s)
- Yi Zhang
- Department of Respiratory Medicine, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310016, Zhejiang, China
| | - Lian Liu
- Department of Respiratory Medicine, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310016, Zhejiang, China
| | - Qiongzi Qiu
- Center for Uterine Cancer Diagnosis & Therapy Research of Zhejiang Province, Women's Reproductive Health Key Laboratory of Zhejiang Province, Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310006, Zhejiang, China
| | - Qing Zhou
- Center for Uterine Cancer Diagnosis & Therapy Research of Zhejiang Province, Women's Reproductive Health Key Laboratory of Zhejiang Province, Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310006, Zhejiang, China
| | - Jinwang Ding
- Department of Head and Neck Surgery, Cancer Hospital of the University of Chinese Academy of Sciences, Zhejiang Cancer Hospital, Key Laboratory of Head & Neck Cancer Translational Research of Zhejiang Province, Hangzhou, 310022, Zhejiang, China.
| | - Yan Lu
- Center for Uterine Cancer Diagnosis & Therapy Research of Zhejiang Province, Women's Reproductive Health Key Laboratory of Zhejiang Province, Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310006, Zhejiang, China.
- Cancer Center, Zhejiang University, Hangzhou, 310029, Zhejiang, China.
| | - Pengyuan Liu
- Department of Respiratory Medicine, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310016, Zhejiang, China.
- Department of Physiology, Center of Systems Molecular Medicine, Medical College of Wisconsin, Milwaukee, WI, 53226, USA.
- Cancer Center, Zhejiang University, Hangzhou, 310029, Zhejiang, China.
| |
Collapse
|
91
|
Fine gene expression regulation by minor sequence variations downstream of the polyadenylation signal. Mol Biol Rep 2021; 48:1539-1547. [PMID: 33517473 DOI: 10.1007/s11033-021-06160-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 01/12/2021] [Indexed: 12/22/2022]
Abstract
The termination of transcription is a complex process that substantially contributes to gene regulation in eukaryotes. Previously, it was noted that a single cytosine deletion at the position + 32 bp relative to the single polyadenylation signal AAUAAA (hereafter the dC mutation) causes a 2-fold increase in the transcription level of the upstream eGFP reporter in mouse embryonic stem cells. Here, we analyzed the conservation of this phenomenon in immortalized mouse, human and drosophila cell lines and the influence of the dC mutation on the choice of the pre-mRNA cleavage sites. We have constructed dual-reporter plasmids to accurately measure the effect of the dC and other nearby located mutations on eGFP mRNA level by RT-qPCR. In this way, we found that the dC mutation leads to a 2-fold increase in the expression level of the upstream eGFP reporter gene in cultured mouse and human, but not in drosophila cells. In addition, 3' RACE analysis demonstrated that eGFP pre-mRNAs are cut at multiple positions between + 14 to + 31, and that the most proximal cleavage site becomes almost exclusively utilized in the presence of the dC mutation. We also identified new short sequence variations located within positions + 25.. + 40 and + 33.. + 48 that increase eGFP expression up to ~2-4-fold. Altogether, the positive effect of the dC mutation seems to be conserved in mouse embryonic stem cells, mouse embryonic 3T3 fibroblasts and human HEK293T cells. In the latter cells, the dC mutation appears to be involved in regulating pre-mRNA cleavage site selection. Finally, a multiplexed approach is proposed to identify motifs located downstream of cleavage site(s) that are essential for transcription termination.
Collapse
|
92
|
Yu L, Liu F, Li Y, Luo J, Jing R. DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors. Front Microbiol 2021; 12:605782. [PMID: 33552038 PMCID: PMC7858263 DOI: 10.3389/fmicb.2021.605782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 01/04/2021] [Indexed: 01/17/2023] Open
Abstract
Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu, China
| |
Collapse
|
93
|
Marini F, Scherzinger D, Danckwardt S. TREND-DB-a transcriptome-wide atlas of the dynamic landscape of alternative polyadenylation. Nucleic Acids Res 2021; 49:D243-D253. [PMID: 32976578 PMCID: PMC7778938 DOI: 10.1093/nar/gkaa722] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 08/06/2020] [Accepted: 08/25/2020] [Indexed: 12/11/2022] Open
Abstract
Alternative polyadenylation (APA) profoundly expands the transcriptome complexity. Perturbations of APA can disrupt biological processes, ultimately resulting in devastating disorders. A major challenge in identifying mechanisms and consequences of APA (and its perturbations) lies in the complexity of RNA 3′ end processing, involving poorly conserved RNA motifs and multi-component complexes consisting of far more than 50 proteins. This is further complicated in that RNA 3′ end maturation is closely linked to transcription, RNA processing and even epigenetic (histone/DNA/RNA) modifications. Here, we present TREND-DB (http://shiny.imbei.uni-mainz.de:3838/trend-db), a resource cataloging the dynamic landscape of APA after depletion of >170 proteins involved in various facets of transcriptional, co- and post-transcriptional gene regulation, epigenetic modifications and further processes. TREND-DB visualizes the dynamics of transcriptome 3′ end diversification (TREND) in a highly interactive manner; it provides a global APA network map and allows interrogating genes affected by specific APA-regulators and vice versa. It also permits condition-specific functional enrichment analyses of APA-affected genes, which suggest wide biological and clinical relevance across all RNAi conditions. The implementation of the UCSC Genome Browser provides additional customizable layers of gene regulation accounting for individual transcript isoforms (e.g. epigenetics, miRNA-binding sites and RNA-binding proteins). TREND-DB thereby fosters disentangling the role of APA for various biological programs, including potential disease mechanisms, and helps identify their diagnostic and therapeutic potential.
Collapse
Affiliation(s)
- Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center Mainz, 55131 Mainz, Germany.,Center for Thrombosis and Hemostasis (CTH), University Medical Center Mainz, 55131 Mainz, Germany
| | - Denise Scherzinger
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center Mainz, 55131 Mainz, Germany
| | - Sven Danckwardt
- Center for Thrombosis and Hemostasis (CTH), University Medical Center Mainz, 55131 Mainz, Germany.,Posttranscriptional Gene Regulation, Cancer Research and Experimental Hemostasis, University Medical Center Mainz, 55131 Mainz, Germany.,Institute for Clinical Chemistry and Laboratory Medicine, University Medical Center Mainz, 55131 Mainz, Germany.,German Center for Cardiovascular Research (DZHK), Rhine-Main, 55131 Mainz, Germany
| |
Collapse
|
94
|
Population-scale genetic control of alternative polyadenylation and its association with human diseases. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
95
|
Real-time neural network based predictor for cov19 virus spread. PLoS One 2020; 15:e0243189. [PMID: 33332363 PMCID: PMC7745974 DOI: 10.1371/journal.pone.0243189] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/17/2020] [Indexed: 01/08/2023] Open
Abstract
Since the epidemic outbreak in early months of 2020 the spread of COVID-19 has grown rapidly in most countries and regions across the World. Because of that, SARS-CoV-2 was declared as a Public Health Emergency of International Concern (PHEIC) on January 30, 2020, by The World Health Organization (WHO). That’s why many scientists are working on new methods to reduce further growth of new cases and, by intelligent patients allocation, reduce number of patients per doctor, what can lead to more successful treatments. However to properly manage the COVID-19 spread there is a need for real-time prediction models which can reliably support various decisions both at national and international level. The problem in developing such system is the lack of general knowledge how the virus spreads and what would be the number of cases each day. Therefore prediction model must be able to conclude the situation from past data in the way that results will show a future trend and will possibly closely relate to the real numbers. In our opinion Artificial Intelligence gives a possibility to do it. In this article we present a model which can work as a part of an online system as a real-time predictor to help in estimation of COVID-19 spread. This prediction model is developed using Artificial Neural Networks (ANN) to estimate the future situation by the use of geo-location and numerical data from past 2 weeks. The results of our model are confirmed by comparing them with real data and, during our research the model was correctly predicting the trend and very closely matching the numbers of new cases in each day.
Collapse
|
96
|
Sanford EM, Emert BL, Coté A, Raj A. Gene regulation gravitates toward either addition or multiplication when combining the effects of two signals. eLife 2020; 9:e59388. [PMID: 33284110 PMCID: PMC7771960 DOI: 10.7554/elife.59388] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 12/04/2020] [Indexed: 01/07/2023] Open
Abstract
Two different cell signals often affect transcription of the same gene. In such cases, it is natural to ask how the combined transcriptional response compares to the individual responses. The most commonly used mechanistic models predict additive or multiplicative combined responses, but a systematic genome-wide evaluation of these predictions is not available. Here, we analyzed the transcriptional response of human MCF-7 cells to retinoic acid and TGF-β, applied individually and in combination. The combined transcriptional responses of induced genes exhibited a range of behaviors, but clearly favored both additive and multiplicative outcomes. We performed paired chromatin accessibility measurements and found that increases in accessibility were largely additive. There was some association between super-additivity of accessibility and multiplicative or super-multiplicative combined transcriptional responses, while sub-additivity of accessibility associated with additive transcriptional responses. Our findings suggest that mechanistic models of combined transcriptional regulation must be able to reproduce a range of behaviors.
Collapse
Affiliation(s)
- Eric M Sanford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Benjamin L Emert
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Allison Coté
- Department of Bioengineering, School of Engineering and Applied Sciences, University of PennsylvaniaPhiladelphiaUnited States
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Arjun Raj
- Department of Bioengineering, School of Engineering and Applied Sciences, University of PennsylvaniaPhiladelphiaUnited States
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| |
Collapse
|
97
|
Hie B, Bryson BD, Berger B. Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design. Cell Syst 2020; 11:461-477.e9. [PMID: 33065027 DOI: 10.1016/j.cels.2020.09.007] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 06/01/2020] [Accepted: 09/23/2020] [Indexed: 12/13/2022]
Abstract
Machine learning that generates biological hypotheses has transformative potential, but most learning algorithms are susceptible to pathological failure when exploring regimes beyond the training data distribution. A solution to address this issue is to quantify prediction uncertainty so that algorithms can gracefully handle novel phenomena that confound standard methods. Here, we demonstrate the broad utility of robust uncertainty prediction in biological discovery. By leveraging Gaussian process-based uncertainty prediction on modern pre-trained features, we train a model on just 72 compounds to make predictions over a 10,833-compound library, identifying and experimentally validating compounds with nanomolar affinity for diverse kinases and whole-cell growth inhibition of Mycobacterium tuberculosis. Uncertainty facilitates a tight iterative loop between computation and experimentation and generalizes across biological domains as diverse as protein engineering and single-cell transcriptomics. More broadly, our work demonstrates that uncertainty should play a key role in the increasing adoption of machine learning algorithms into the experimental lifecycle.
Collapse
Affiliation(s)
- Brian Hie
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Bryan D Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Ragon Institute of Massachusetts General Hospital, MIT, and Harvard, Cambridge, MA 02139, USA.
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
98
|
Maslova A, Ramirez RN, Ma K, Schmutz H, Wang C, Fox C, Ng B, Benoist C, Mostafavi S. Deep learning of immune cell differentiation. Proc Natl Acad Sci U S A 2020; 117:25655-25666. [PMID: 32978299 PMCID: PMC7568267 DOI: 10.1073/pnas.2011795117] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Although we know many sequence-specific transcription factors (TFs), how the DNA sequence of cis-regulatory elements is decoded and orchestrated on the genome scale to determine immune cell differentiation is beyond our grasp. Leveraging a granular atlas of chromatin accessibility across 81 immune cell types, we asked if a convolutional neural network (CNN) could learn to infer cell type-specific chromatin accessibility solely from regulatory DNA sequences. With a tailored architecture and an ensemble approach to CNN parameter interpretation, we show that our trained network ("AI-TAC") does so by rediscovering ab initio the binding motifs for known regulators and some unknown ones. Motifs whose importance is learned virtually as functionally important overlap strikingly well with positions determined by chromatin immunoprecipitation for several TFs. AI-TAC establishes a hierarchy of TFs and their interactions that drives lineage specification and also identifies stage-specific interactions, like Pax5/Ebf1 vs. Pax5/Prdm1, or the role of different NF-κB dimers in different cell types. AI-TAC assigns Spi1/Cebp and Pax5/Ebf1 as the drivers necessary for myeloid and B lineage fates, respectively, but no factors seemed as dominantly required for T cell differentiation, which may represent a fall-back pathway. Mouse-trained AI-TAC can parse human DNA, revealing a strikingly similar ranking of influential TFs and providing additional support that AI-TAC is a generalizable regulatory sequence decoder. Thus, deep learning can reveal the regulatory syntax predictive of the full differentiative complexity of the immune system.
Collapse
Affiliation(s)
- Alexandra Maslova
- Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | | | - Ke Ma
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Hugo Schmutz
- Department of Immunology, Harvard Medical School, Boston, MA 02115
| | - Chendi Wang
- Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Curtis Fox
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Bernard Ng
- Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | | | - Sara Mostafavi
- Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada;
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Canadian Institute for Advanced Research, CIFAR AI, Toronto, ON M5G 1M1, Canada
- Vector Institute, Toronto, ON M5G 1M1, Canada
| |
Collapse
|
99
|
Yu L, Jing R, Liu F, Luo J, Li Y. DeepACP: A Novel Computational Approach for Accurate Identification of Anticancer Peptides by Deep Learning Algorithm. MOLECULAR THERAPY-NUCLEIC ACIDS 2020; 22:862-870. [PMID: 33230481 PMCID: PMC7658571 DOI: 10.1016/j.omtn.2020.10.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/06/2020] [Indexed: 12/24/2022]
Abstract
Cancer is one of the most dangerous diseases to human health. The accurate prediction of anticancer peptides (ACPs) would be valuable for the development and design of novel anticancer agents. Current deep neural network models have obtained state-of-the-art prediction accuracy for the ACP classification task. However, based on existing studies, it remains unclear which deep learning architecture achieves the best performance. Thus, in this study, we first present a systematic exploration of three important deep learning architectures: convolutional, recurrent, and convolutional-recurrent networks for distinguishing ACPs from non-ACPs. We find that the recurrent neural network with bidirectional long short-term memory cells is superior to other architectures. By utilizing the proposed model, we implement a sequence-based deep learning tool (DeepACP) to accurately predict the likelihood of a peptide exhibiting anticancer activity. The results indicate that DeepACP outperforms several existing methods and can be used as an effective tool for the prediction of anticancer peptides. Furthermore, we visualize and understand the deep learning model. We hope that our strategy can be extended to identify other types of peptides and may provide more assistance to the development of proteomics and new drugs.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China
- Corresponding author: Lezheng Yu, School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China.
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, Sichuan, China
- Corresponding author: Jiesi Luo, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, Sichuan, China.
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| |
Collapse
|
100
|
Valeri JA, Collins KM, Ramesh P, Alcantar MA, Lepe BA, Lu TK, Camacho DM. Sequence-to-function deep learning frameworks for engineered riboregulators. Nat Commun 2020; 11:5058. [PMID: 33028819 PMCID: PMC7541510 DOI: 10.1038/s41467-020-18676-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 09/02/2020] [Indexed: 12/26/2022] Open
Abstract
While synthetic biology has revolutionized our approaches to medicine, agriculture, and energy, the design of completely novel biological circuit components beyond naturally-derived templates remains challenging due to poorly understood design rules. Toehold switches, which are programmable nucleic acid sensors, face an analogous design bottleneck; our limited understanding of how sequence impacts functionality often necessitates expensive, time-consuming screens to identify effective switches. Here, we introduce Sequence-based Toehold Optimization and Redesign Model (STORM) and Nucleic-Acid Speech (NuSpeak), two orthogonal and synergistic deep learning architectures to characterize and optimize toeholds. Applying techniques from computer vision and natural language processing, we 'un-box' our models using convolutional filters, attention maps, and in silico mutagenesis. Through transfer-learning, we redesign sub-optimal toehold sensors, even with sparse training data, experimentally validating their improved performance. This work provides sequence-to-function deep learning frameworks for toehold selection and design, augmenting our ability to construct potent biological circuit components and precision diagnostics.
Collapse
Affiliation(s)
- Jacqueline A Valeri
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Katherine M Collins
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Pradeep Ramesh
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
| | - Miguel A Alcantar
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Bianca A Lepe
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Timothy K Lu
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Synthetic Biology Group, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Diogo M Camacho
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA.
| |
Collapse
|