1
|
Liu Q, Cheng X, Liu G, Li B, Liu X. Deep learning improves the ability of sgRNA off-target propensity prediction. BMC Bioinformatics 2020; 21:51. [PMID: 32041517 PMCID: PMC7011380 DOI: 10.1186/s12859-020-3395-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Accepted: 02/04/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND CRISPR/Cas9 system, as the third-generation genome editing technology, has been widely applied in target gene repair and gene expression regulation. Selection of appropriate sgRNA can improve the on-target knockout efficacy of CRISPR/Cas9 system with high sensitivity and specificity. However, when CRISPR/Cas9 system is operating, unexpected cleavage may occur at some sites, known as off-target. Presently, a number of prediction methods have been developed to predict the off-target propensity of sgRNA at specific DNA fragments. Most of them use artificial feature extraction operations and machine learning techniques to obtain off-target scores. With the rapid expansion of off-target data and the rapid development of deep learning theory, the existing prediction methods can no longer satisfy the prediction accuracy at the clinical level. RESULTS Here, we propose a prediction method named CnnCrispr to predict the off-target propensity of sgRNA at specific DNA fragments. CnnCrispr automatically trains the sequence features of sgRNA-DNA pairs with GloVe model, and embeds the trained word vector matrix into the deep learning model including biLSTM and CNN with five hidden layers. We conducted performance verification on the data set provided by DeepCrispr, and found that the auROC and auPRC in the "leave-one-sgRNA-out" cross validation could reach 0.957 and 0.429 respectively (the Pearson value and spearman value could reach 0.495 and 0.151 respectively under the same settings). CONCLUSION Our results show that CnnCrispr has better classification and regression performance than the existing states-of-art models. The code for CnnCrispr can be freely downloaded from https://github.com/LQYoLH/CnnCrispr.
Collapse
Affiliation(s)
- Qiaoyue Liu
- Department of information and computing science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiang Cheng
- Department of information and computing science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Gan Liu
- Department of information and computing science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Bohao Li
- Department of information and computing science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiuqin Liu
- Department of information and computing science, University of Science and Technology Beijing, Beijing, 100083, China.
| |
Collapse
|
2
|
Song J, Zheng Y, Huang M, Wu L, Wang W, Zhu Z, Song Y, Yang C. A Sequential Multidimensional Analysis Algorithm for Aptamer Identification based on Structure Analysis and Machine Learning. Anal Chem 2020; 92:3307-3314. [PMID: 31876151 DOI: 10.1021/acs.analchem.9b05203] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Molecular recognition ligands are of great significance in many fields, but our ability to develop new recognition molecules remains to be expanded. Here, we developed a Sequential Multidimensional Analysis algoRiThm for aptamer discovery (SMART-Aptamer) from high-throughput sequencing (HTS) data of SELEX libraries based on multilevel structure analysis and unsupervised machine learning to discover nucleic acid recognition ligands with high accuracy and efficiency. We validated SMART-Aptamer with three sets of HTS data from screening pools against hESCs, EpCAM, and CSV. High affinity aptamers for all three targets were successfully obtained, and the results revealed that SMART-Aptamer is able to pick out high affinity aptamers with low false positive and negative rates. With the advantages of accuracy, efficiency, and robustness, SMART-Aptamer represents a paradigm-shift strategy for the discovery of binding ligands for a variety of biomedical applications.
Collapse
Affiliation(s)
- Jia Song
- Institute of Molecular Medicine, Renji Hospital, School of Medicine , Shanghai Jiao Tong University , Shanghai 200127 , China
| | - Yuan Zheng
- Institute of Molecular Medicine, Renji Hospital, School of Medicine , Shanghai Jiao Tong University , Shanghai 200127 , China
| | - Mengjiao Huang
- State Key Laboratory for Physical Chemistry of Solid Surfaces, Key Laboratory for Chemical Biology of Fujian Province, Key Laboratory of Analytical Chemistry, and Department of Chemical Biology, College of Chemistry and Chemical Engineering , Xiamen University , Xiamen , 361005 , People's Republic of China
| | - Lingling Wu
- Institute of Molecular Medicine, Renji Hospital, School of Medicine , Shanghai Jiao Tong University , Shanghai 200127 , China
| | - Wei Wang
- Institute of Molecular Medicine, Renji Hospital, School of Medicine , Shanghai Jiao Tong University , Shanghai 200127 , China
| | - Zhi Zhu
- State Key Laboratory for Physical Chemistry of Solid Surfaces, Key Laboratory for Chemical Biology of Fujian Province, Key Laboratory of Analytical Chemistry, and Department of Chemical Biology, College of Chemistry and Chemical Engineering , Xiamen University , Xiamen , 361005 , People's Republic of China
| | - Yanling Song
- Institute of Molecular Medicine, Renji Hospital, School of Medicine , Shanghai Jiao Tong University , Shanghai 200127 , China.,State Key Laboratory for Physical Chemistry of Solid Surfaces, Key Laboratory for Chemical Biology of Fujian Province, Key Laboratory of Analytical Chemistry, and Department of Chemical Biology, College of Chemistry and Chemical Engineering , Xiamen University , Xiamen , 361005 , People's Republic of China
| | - Chaoyong Yang
- Institute of Molecular Medicine, Renji Hospital, School of Medicine , Shanghai Jiao Tong University , Shanghai 200127 , China.,State Key Laboratory for Physical Chemistry of Solid Surfaces, Key Laboratory for Chemical Biology of Fujian Province, Key Laboratory of Analytical Chemistry, and Department of Chemical Biology, College of Chemistry and Chemical Engineering , Xiamen University , Xiamen , 361005 , People's Republic of China
| |
Collapse
|
3
|
Sullivan R, Adams MC, Naik RR, Milam VT. Analyzing Secondary Structure Patterns in DNA Aptamers Identified via CompELS. Molecules 2019; 24:molecules24081572. [PMID: 31010064 PMCID: PMC6515186 DOI: 10.3390/molecules24081572] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/09/2019] [Accepted: 04/15/2019] [Indexed: 12/12/2022] Open
Abstract
In contrast to sophisticated high-throughput sequencing tools for genomic DNA, analytical tools for comparing secondary structure features between multiple single-stranded DNA sequences are less developed. For single-stranded nucleic acid ligands called aptamers, secondary structure is widely thought to play a pivotal role in driving recognition-based binding activity between an aptamer sequence and its specific target. Here, we employ a competition-based aptamer screening platform called CompELS to identify DNA aptamers for a colloidal target. We then analyze predicted secondary structures of the aptamers and a large population of random sequences to identify sequence features and patterns. Our secondary structure analysis identifies patterns ranging from position-dependent score matrixes of individual structural elements to position-independent consensus domains resulting from global alignment.
Collapse
Affiliation(s)
- Richard Sullivan
- School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Dr. NW, Atlanta, GA 30332-0245, USA.
| | - Mary Catherine Adams
- School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Dr. NW, Atlanta, GA 30332-0245, USA.
| | - Rajesh R Naik
- 711 Human Performance Wing, Air Force Research Laboratory, Wright Patterson AFB, OH 45433, USA.
| | - Valeria T Milam
- School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Dr. NW, Atlanta, GA 30332-0245, USA.
- Wallace H. Coulter, Department of Biomedical Engineering, Georgia Institute of Technology, 313 Ferst Dr., Atlanta, GA 30332, USA.
- Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, 315 Ferst Dr., Atlanta, GA 30332-0363, USA.
| |
Collapse
|
4
|
Abstract
The rRNA is the largest and most abundant RNA in bacterial and archaeal cells. It is also one of the best-characterized RNAs in terms of its structural motifs and sequence variation. Production of ribosome components including >50 ribosomal proteins (r-proteins) consumes significant cellular resources. Thus, RNA cis-regulatory structures that interact with r-proteins to repress further r-protein synthesis play an important role in maintaining appropriate stoichiometry between r-proteins and rRNA. Classically, such mRNA structures were thought to directly mimic the rRNA. However, more than 30 years of research has demonstrated that a variety of different recognition and regulatory paradigms are present. This review will demonstrate how structural mimicry between the rRNA and mRNA cis-regulatory structures may take many different forms. The collection of mRNA structures that interact with r-proteins to regulate r-protein operons are best characterized in Escherichia coli, but are increasingly found within species from nearly all phyla of bacteria and several archaea. Furthermore, they represent a unique opportunity to assess the plasticity of RNA structure in the context of RNA-protein interactions. The binding determinants imposed by r-proteins to allow regulation can be fulfilled in many ways. Some r-protein-interacting mRNAs are immediately obvious as rRNA mimics from primary sequence similarity, others are identifiable only after secondary or tertiary structure determination, and some show no obvious similarity. In addition, across different bacterial species a host of different mechanisms of action have been characterized, showing that there is no simple one-size-fits-all solution.
Collapse
|
5
|
Pan X, Rijnbeek P, Yan J, Shen HB. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics 2018; 19:511. [PMID: 29970003 PMCID: PMC6029131 DOI: 10.1186/s12864-018-4889-1] [Citation(s) in RCA: 135] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 06/19/2018] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND RNA regulation is significantly dependent on its binding protein partner, known as the RNA-binding proteins (RBPs). Unfortunately, the binding preferences for most RBPs are still not well characterized. Interdependencies between sequence and secondary structure specificities is challenging for both predicting RBP binding sites and accurate sequence and structure motifs detection. RESULTS In this study, we propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, to enable subsequent convolution operations. To reveal the hidden binding knowledge from the observed sequences, the CNNs are applied to learn the abstract features. Considering the close relationship between sequence and predicted structures, we use the BLSTM to capture possible long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. We evaluated iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets. The results demonstrate that iDeepS can reliably predict the RBP binding sites on RNAs, and outperforms the state-of-the-art methods. An important advantage compared to other methods is that iDeepS can automatically extract both binding sequence and structure motifs, which will improve our understanding of the mechanisms of binding specificities of RBPs. CONCLUSION Our study shows that the iDeepS method identifies the sequence and structure motifs to accurately predict RBP binding sites. iDeepS is available at https://github.com/xypan1232/iDeepS .
Collapse
Affiliation(s)
- Xiaoyong Pan
- Department of Medical Informatics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Peter Rijnbeek
- Department of Medical Informatics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Junchi Yan
- Institute of Software Engineering, East China Normal University, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
6
|
Hamada M. In silico approaches to RNA aptamer design. Biochimie 2017; 145:8-14. [PMID: 29032056 DOI: 10.1016/j.biochi.2017.10.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/09/2017] [Indexed: 10/18/2022]
Abstract
RNA aptamers are ribonucleic acids that bind to specific target molecules. An RNA aptamer for a disease-related protein has great potential for development into a new drug. However, huge time and cost investments are required to develop an RNA aptamer into a pharmaceutical. Recently, SELEX combined with high-throughput sequencers (i.e., HT-SELEX) has been widely used to select candidate RNA aptamers that bind to a target protein with high affinity and specificity. After candidate selection, further optimizations such as shortening and modifying candidate sequences are performed. In these steps, in silico approaches are expected to reduce the time and cost associated with aptamer drug development. In this article, we review existing in silico approaches to RNA aptamer development, including a method for ranking the candidates of RNA aptamers from HT-SELEX data, clustering a huge number of aptamer sequences, and finding motifs amidst a set of significant RNA aptamers. It is expected that further studies in addition to these methods will be utilized for in silico RNA aptamer design, permitting a minimal number of experiments to be performed through the utilization of sophisticated computational methods.
Collapse
Affiliation(s)
- Michiaki Hamada
- Bioinformatics Laboratory, Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan; Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 63-520, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan; Institute for Medical-oriented Structural Biology, Waseda University, 2-2, Wakamatsu-cho Shinjuku-ku, Tokyo 162-8480, Japan; Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26, Aomi, Koto-ku, Tokyo 135-0064, Japan; Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan.
| |
Collapse
|
7
|
Catuogno S, Esposito CL. Aptamer Cell-Based Selection: Overview and Advances. Biomedicines 2017; 5:biomedicines5030049. [PMID: 28805744 PMCID: PMC5618307 DOI: 10.3390/biomedicines5030049] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 08/03/2017] [Accepted: 08/08/2017] [Indexed: 02/07/2023] Open
Abstract
Aptamers are high affinity single-stranded DNA/RNA molecules, produced by a combinatorial procedure named SELEX (Systematic Evolution of Ligands by Exponential enrichment), that are emerging as promising diagnostic and therapeutic tools. Among selection strategies, procedures using living cells as complex targets (referred as "cell-SELEX") have been developed as an effective mean to generate aptamers for heavily modified cell surface proteins, assuring the binding of the target in its native conformation. Here we give an up-to-date overview on cell-SELEX technology, discussing the most recent advances with a particular focus on cancer cell targeting. Examples of the different protocol applications and post-SELEX strategies will be briefly outlined.
Collapse
Affiliation(s)
- Silvia Catuogno
- Istituto di Endocrinologia ed Oncologia Sperimentale "G. Salvatore", CNR, Naples 80100, Italy.
| | - Carla Lucia Esposito
- Istituto di Endocrinologia ed Oncologia Sperimentale "G. Salvatore", CNR, Naples 80100, Italy.
| |
Collapse
|