1
|
Xu M, Pang J, Ye Y, Zhang Z. Integrating Traditional Machine Learning and Deep Learning for Precision Screening of Anticancer Peptides: A Novel Approach for Efficient Drug Discovery. ACS OMEGA 2024; 9:16820-16831. [PMID: 38617603 PMCID: PMC11007766 DOI: 10.1021/acsomega.4c01374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 03/03/2024] [Accepted: 03/22/2024] [Indexed: 04/16/2024]
Abstract
The rapid and effective identification of anticancer peptides (ACPs) by computer technology provides a new perspective for cancer treatment. In the identification process of ACPs, accurate sequence encoding and effective classification models are crucial for predicting their biological activity. Traditional machine learning methods have been widely applied in sequence analysis, but deep learning provides a new approach to capture sequence complexity. In this study, a two-stage ACPs classification model was innovatively proposed. Three novel coding strategies were explored; two mainstream Natural Language Processing (NLP) models and 11 machine learning models were fused to identify ACPs, which significantly improved the prediction accuracy of ACPs. We analyzed the correlation between peptide chain amino acids and evaluated the relevant performance of the model by the ROC curve and t-SNE dimensionality reduction technique. The results indicated that the deep learning and machine learning fusion models of M3E-base and KNeighborsDist models, especially when considering the semantic information on amino acid sequences, achieved the highest average accuracy (AvgAcc) of 0.939, with an AUC value as high as 0.97. Then, in vitro cell experiments were used to verify that the two ACPs predicted by the model had antitumor efficacy. This study provides a convenient and effective method for screening ACPs. With further optimization and testing, these strategies have the potential to play an important role in drug discovery and design.
Collapse
Affiliation(s)
- Meiqi Xu
- Key
Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang
Province, School of Medicine, Hangzhou City
University, Hangzhou 310015, Zhejiang, China
| | - Jiefu Pang
- School
of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
| | - Yangyang Ye
- Key
Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang
Province, School of Medicine, Hangzhou City
University, Hangzhou 310015, Zhejiang, China
| | - Ziyi Zhang
- Key
Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang
Province, School of Medicine, Hangzhou City
University, Hangzhou 310015, Zhejiang, China
| |
Collapse
|
2
|
Salimi A, Jang JH, Lee JY. Leveraging attention-enhanced variational autoencoders: Novel approach for investigating latent space of aptamer sequences. Int J Biol Macromol 2024; 255:127884. [PMID: 37926303 DOI: 10.1016/j.ijbiomac.2023.127884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/27/2023] [Accepted: 11/02/2023] [Indexed: 11/07/2023]
Abstract
Aptamers are increasingly recognized as potent alternatives to antibodies for diagnostic and therapeutic applications. The application of deep learning, particularly attention-based models, for aptamer (DNA/RNA) sequences is an innovative field. The ongoing advancements in aptamer sequencing technologies coupled with machine learning algorithms have resulted in novel developments. Further research is required to investigate the full potential of deep learning models and address the challenges associated with the generation of sequences, like the large search space of possible sequences. In this study, we propose a workflow that integrates an attention mechanism within a framework of a generative variational autoencoder, to generate novel sequences by expanding latent memory. They show 100 % novelty compared with the dataset, and approximately 88 % of them show negative values for the minimum free energy, which may indicate the likelihood of an RNA sequence folding into a functional structure. Because the field of aptamer discovery is affected by data scarcity, advanced strategies that facilitate the generation of diverse and superior sequences are necessitated. The utilization of our workflow can result in novel aptamers. Thus, investigations such as the present study can address the abovementioned challenge. Our research is anticipated to facilitate further discoveries and advancements in aptamer fields.
Collapse
Affiliation(s)
- Abbas Salimi
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Jee Hwan Jang
- School of Materials Science and Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea; Ucaretron Inc., No. 3508, 40, Simin-daero 365 beon-gil, Dongan-gu, Anyang-si, Gyeonggi-do, Republic of Korea.
| | - Jin Yong Lee
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| |
Collapse
|
3
|
Shin I, Kang K, Kim J, Sel S, Choi J, Lee JW, Kang HY, Song G. AptaTrans: a deep neural network for predicting aptamer-protein interaction using pretrained encoders. BMC Bioinformatics 2023; 24:447. [PMID: 38012571 PMCID: PMC10680337 DOI: 10.1186/s12859-023-05577-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 11/21/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Aptamers, which are biomaterials comprised of single-stranded DNA/RNA that form tertiary structures, have significant potential as next-generation materials, particularly for drug discovery. The systematic evolution of ligands by exponential enrichment (SELEX) method is a critical in vitro technique employed to identify aptamers that bind specifically to target proteins. While advanced SELEX-based methods such as Cell- and HT-SELEX are available, they often encounter issues such as extended time consumption and suboptimal accuracy. Several In silico aptamer discovery methods have been proposed to address these challenges. These methods are specifically designed to predict aptamer-protein interaction (API) using benchmark datasets. However, these methods often fail to consider the physicochemical interactions between aptamers and proteins within tertiary structures. RESULTS In this study, we propose AptaTrans, a pipeline for predicting API using deep learning techniques. AptaTrans uses transformer-based encoders to handle aptamer and protein sequences at the monomer level. Furthermore, pretrained encoders are utilized for the structural representation. After validation with a benchmark dataset, AptaTrans has been integrated into a comprehensive toolset. This pipeline synergistically combines with Apta-MCTS, a generative algorithm for recommending aptamer candidates. CONCLUSION The results show that AptaTrans outperforms existing models for predicting API, and the efficacy of the AptaTrans pipeline has been confirmed through various experimental tools. We expect AptaTrans will enhance the cost-effectiveness and efficiency of SELEX in drug discovery. The source code and benchmark dataset for AptaTrans are available at https://github.com/pnumlb/AptaTrans .
Collapse
Affiliation(s)
- Incheol Shin
- Division of Artificial Intelligence, Pusan National University, Busan, Republic of Korea
| | - Keumseok Kang
- Division of Artificial Intelligence, Pusan National University, Busan, Republic of Korea
| | - Juseong Kim
- Division of Artificial Intelligence, Pusan National University, Busan, Republic of Korea
| | - Sanghun Sel
- Division of Artificial Intelligence, Pusan National University, Busan, Republic of Korea
| | - Jeonghoon Choi
- Division of Artificial Intelligence, Pusan National University, Busan, Republic of Korea
| | - Jae-Wook Lee
- Research & Development, NuclixBio, Seoul, Republic of Korea
| | - Ho Young Kang
- Research & Development, NuclixBio, Seoul, Republic of Korea
| | - Giltae Song
- Division of Artificial Intelligence, Pusan National University, Busan, Republic of Korea.
- School of Computer Science and Engineering, Pusan National University, Busan, Republic of Korea.
- Center for Artificial Intelligence Research, Pusan National University, Busan, Republic of Korea.
| |
Collapse
|
4
|
Andress C, Kappel K, Villena ME, Cuperlovic-Culf M, Yan H, Li Y. DAPTEV: Deep aptamer evolutionary modelling for COVID-19 drug design. PLoS Comput Biol 2023; 19:e1010774. [PMID: 37406007 DOI: 10.1371/journal.pcbi.1010774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 06/13/2023] [Indexed: 07/07/2023] Open
Abstract
Typical drug discovery and development processes are costly, time consuming and often biased by expert opinion. Aptamers are short, single-stranded oligonucleotides (RNA/DNA) that bind to target proteins and other types of biomolecules. Compared with small-molecule drugs, aptamers can bind to their targets with high affinity (binding strength) and specificity (uniquely interacting with the target only). The conventional development process for aptamers utilizes a manual process known as Systematic Evolution of Ligands by Exponential Enrichment (SELEX), which is costly, slow, dependent on library choice and often produces aptamers that are not optimized. To address these challenges, in this research, we create an intelligent approach, named DAPTEV, for generating and evolving aptamer sequences to support aptamer-based drug discovery and development. Using the COVID-19 spike protein as a target, our computational results suggest that DAPTEV is able to produce structurally complex aptamers with strong binding affinities.
Collapse
Affiliation(s)
- Cameron Andress
- Department of Computer Science, Brock University, St. Catharines, Canada
| | - Kalli Kappel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | | | | | - Hongbin Yan
- Department of Chemistry, Brock University, St. Catharines, Canada
| | - Yifeng Li
- Department of Computer Science, Brock University, St. Catharines, Canada
- Department of Biological Sciences, Brock University, St. Catharines, Canada
| |
Collapse
|
5
|
Sun D, Sun M, Zhang J, Lin X, Zhang Y, Lin F, Zhang P, Yang C, Song J. Computational tools for aptamer identification and optimization. Trends Analyt Chem 2022. [DOI: 10.1016/j.trac.2022.116767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Uwiragiye E, Rhinehardt KL. TFIDF-Random Forest: Prediction of Aptamer-Protein Interacting Pairs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3032-3037. [PMID: 34310317 DOI: 10.1109/tcbb.2021.3098709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Aptamers are short, single-stranded oligonucleotides or peptides generated from in vitro selection to selectively bind with various molecules. Due to their molecular recognition capability for proteins, aptamers are becoming promising reagents in new drug development. Aptamers can fold into specific spatial configuration that bind to certain targets with extremely high specificity. The ability of aptamers to reversibly bind proteins has generated increasing interest in using them to facilitate controlled release of therapeutic biomolecules. In-vitro selection experiments to produce the aptamer-protein binding pairs is very complex and MD/MM in-silico experiments can be computationally expensive. In this study, we introduce a natural language processing approach for data-driven computational selection. We compared our method to the sequential model with the embedding layer, applied in the literature. We transformed the DNA/RNA and protein sequences into text format using a sliding window approach. This methodology showed that efficiency was notably higher than those observed from the literature. This indicates that our preliminary model has marked improvement over previous models which brings us closer to a data-driven computational selection method.
Collapse
|
7
|
Hao S, Hu X, Feng Z, Sun K, You X, Wang Z, Yang C. Prediction of metal ion ligand binding residues by adding disorder value and propensity factors based on deep learning algorithm. Front Genet 2022; 13:969412. [PMID: 36035120 PMCID: PMC9402973 DOI: 10.3389/fgene.2022.969412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 07/04/2022] [Indexed: 11/13/2022] Open
Abstract
Proteins need to interact with different ligands to perform their functions. Among the ligands, the metal ion is a major ligand. At present, the prediction of protein metal ion ligand binding residues is a challenge. In this study, we selected Zn2+, Cu2+, Fe2+, Fe3+, Co2+, Mn2+, Ca2+ and Mg2+ metal ion ligands from the BioLip database as the research objects. Based on the amino acids, the physicochemical properties and predicted structural information, we introduced the disorder value as the feature parameter. In addition, based on the component information, position weight matrix and information entropy, we introduced the propensity factor as prediction parameters. Then, we used the deep neural network algorithm for the prediction. Furtherly, we made an optimization for the hyper-parameters of the deep learning algorithm and obtained improved results than the previous IonSeq method.
Collapse
Affiliation(s)
- Sixi Hao
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
- *Correspondence: Xiuzhen Hu, ; Zhenxing Feng,
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
- *Correspondence: Xiuzhen Hu, ; Zhenxing Feng,
| | - Kai Sun
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| | - Xiaoxiao You
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| | - Ziyang Wang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| | - Caiyun Yang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| |
Collapse
|
8
|
Nosrati M, Amani J. In silico screening of ssDNA aptamer against Escherichia coli O157:H7: A machine learning and the Pseudo K-tuple nucleotide composition based approach. Comput Biol Chem 2021; 95:107568. [PMID: 34543910 DOI: 10.1016/j.compbiolchem.2021.107568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 08/02/2021] [Accepted: 08/24/2021] [Indexed: 02/07/2023]
Abstract
This study was planned to in silico screening of ssDNA aptamer against Escherichia coli O157:H7 by combination of machine learning and the PseKNC approach. For this, firstly a total numbers of 47 validated ssDNA aptamers as well as 498 random DNA sequences were considered as positive and negative training data respectively. The sequences then converted to numerical vectors using PseKNC method through Pse-in-one 2.0 web server. After that, the numerical vectors were subjected to classification by the SVM, ANN and RF algorithms available in Orange 3.2.0 software. The performances of the tested models were evaluated using cross-validation, random sampling and ROC curve analyzes. The primary results demonstrated that the ANN and RF algorithms have appropriate performances for the data classification. To improve the performances of mentioned classifiers the positive training data was triplicated and re-training process was also performed. The results confirmed that data size improvement had significant effect on the accuracy of data classification especially about RF model. Subsequently, the RF algorithm with accuracy of 98% was selected for aptamer screening. The thermodynamics details of folding process as well as secondary structures of the screened aptamers were also considered as final evaluations. The results confirmed that the selected aptamers by the proposed method had appropriate structure properties and there is no thermodynamics limit for the aptamers folding.
Collapse
Affiliation(s)
- Mokhtar Nosrati
- Department of Biotechnology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran.
| | - Jafar Amani
- Applied Microbiology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
9
|
Zhou H, Wekesa JS, Luan Y, Meng J. PRPI-SC: an ensemble deep learning model for predicting plant lncRNA-protein interactions. BMC Bioinformatics 2021; 22:415. [PMID: 34429059 PMCID: PMC8385908 DOI: 10.1186/s12859-021-04328-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 11/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Plant long non-coding RNAs (lncRNAs) play vital roles in many biological processes mainly through interactions with RNA-binding protein (RBP). To understand the function of lncRNAs, a fundamental method is to identify which types of proteins interact with the lncRNAs. However, the models or rules of interactions are a major challenge when calculating and estimating the types of RBP. RESULTS In this study, we propose an ensemble deep learning model to predict plant lncRNA-protein interactions using stacked denoising autoencoder and convolutional neural network based on sequence and structural information, named PRPI-SC. PRPI-SC predicts interactions between lncRNAs and proteins based on the k-mer features of RNAs and proteins. Experiments proved good results on Arabidopsis thaliana and Zea mays datasets (ATH948 and ZEA22133). The accuracy rates of ATH948 and ZEA22133 datasets were 88.9% and 82.6%, respectively. PRPI-SC also performed well on some public RNA protein interaction datasets. CONCLUSIONS PRPI-SC accurately predicts the interaction between plant lncRNA and protein, which plays a guiding role in studying the function and expression of plant lncRNA. At the same time, PRPI-SC has a strong generalization ability and good prediction effect for non-plant data.
Collapse
Affiliation(s)
- Haoran Zhou
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| |
Collapse
|
10
|
Predicting aptamer sequences that interact with target proteins using an aptamer-protein interaction classifier and a Monte Carlo tree search approach. PLoS One 2021; 16:e0253760. [PMID: 34170922 PMCID: PMC8232527 DOI: 10.1371/journal.pone.0253760] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 06/14/2021] [Indexed: 11/19/2022] Open
Abstract
Oligonucleotide-based aptamers, which have a three-dimensional structure with a single-stranded fragment, feature various characteristics with respect to size, toxicity, and permeability. Accordingly, aptamers are advantageous in terms of diagnosis and treatment and are materials that can be produced through relatively simple experiments. Systematic evolution of ligands by exponential enrichment (SELEX) is one of the most widely used experimental methods for generating aptamers; however, it is highly expensive and time-consuming. To reduce the related costs, recent studies have used in silico approaches, such as aptamer-protein interaction (API) classifiers that use sequence patterns to determine the binding affinity between RNA aptamers and proteins. Some of these methods generate candidate RNA aptamer sequences that bind to a target protein, but they are limited to producing candidates of a specific size. In this study, we present a machine learning approach for selecting candidate sequences of various sizes that have a high binding affinity for a specific sequence of a target protein. We applied the Monte Carlo tree search (MCTS) algorithm for generating the candidate sequences using a score function based on an API classifier. The tree structure that we designed with MCTS enables nucleotide sequence sampling, and the obtained sequences are potential aptamer candidates. We performed a quality assessment using the scores of docking simulations. Our validation datasets revealed that our model showed similar or better docking scores in ZDOCK docking simulations than the known aptamers. We expect that our method, which is size-independent and easy to use, can provide insights into searching for an appropriate aptamer sequence for a target protein during the simulation step of SELEX.
Collapse
|
11
|
Saldías MP, Maureira D, Orellana-Serradell O, Silva I, Lavanderos B, Cruz P, Torres C, Cáceres M, Cerda O. TRP Channels Interactome as a Novel Therapeutic Target in Breast Cancer. Front Oncol 2021; 11:621614. [PMID: 34178620 PMCID: PMC8222984 DOI: 10.3389/fonc.2021.621614] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 03/31/2021] [Indexed: 12/14/2022] Open
Abstract
Breast cancer is one of the most frequent cancer types worldwide and the first cause of cancer-related deaths in women. Although significant therapeutic advances have been achieved with drugs such as tamoxifen and trastuzumab, breast cancer still caused 627,000 deaths in 2018. Since cancer is a multifactorial disease, it has become necessary to develop new molecular therapies that can target several relevant cellular processes at once. Ion channels are versatile regulators of several physiological- and pathophysiological-related mechanisms, including cancer-relevant processes such as tumor progression, apoptosis inhibition, proliferation, migration, invasion, and chemoresistance. Ion channels are the main regulators of cellular functions, conducting ions selectively through a pore-forming structure located in the plasma membrane, protein–protein interactions one of their main regulatory mechanisms. Among the different ion channel families, the Transient Receptor Potential (TRP) family stands out in the context of breast cancer since several members have been proposed as prognostic markers in this pathology. However, only a few approaches exist to block their specific activity during tumoral progress. In this article, we describe several TRP channels that have been involved in breast cancer progress with a particular focus on their binding partners that have also been described as drivers of breast cancer progression. Here, we propose disrupting these interactions as attractive and potential new therapeutic targets for treating this neoplastic disease.
Collapse
Affiliation(s)
- María Paz Saldías
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile
| | - Diego Maureira
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile
| | - Octavio Orellana-Serradell
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile
| | - Ian Silva
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile
| | - Boris Lavanderos
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile
| | - Pablo Cruz
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile
| | - Camila Torres
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile
| | - Mónica Cáceres
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile.,The Wound Repair, Treatment, and Health (WoRTH) Initiative, Santiago, Chile
| | - Oscar Cerda
- Program of Cellular and Molecular Biology, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile.,Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Santiago, Chile.,The Wound Repair, Treatment, and Health (WoRTH) Initiative, Santiago, Chile
| |
Collapse
|
12
|
Emami N, Ferdousi R. AptaNet as a deep learning approach for aptamer-protein interaction prediction. Sci Rep 2021; 11:6074. [PMID: 33727685 PMCID: PMC7971039 DOI: 10.1038/s41598-021-85629-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 03/03/2021] [Indexed: 02/08/2023] Open
Abstract
Aptamers are short oligonucleotides (DNA/RNA) or peptide molecules that can selectively bind to their specific targets with high specificity and affinity. As a powerful new class of amino acid ligands, aptamers have high potentials in biosensing, therapeutic, and diagnostic fields. Here, we present AptaNet-a new deep neural network-to predict the aptamer-protein interaction pairs by integrating features derived from both aptamers and the target proteins. Aptamers were encoded by using two different strategies, including k-mer and reverse complement k-mer frequency. Amino acid composition (AAC) and pseudo amino acid composition (PseAAC) were applied to represent target information using 24 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied a neighborhood cleaning algorithm. The predictor was constructed based on a deep neural network, and optimal features were selected using the random forest algorithm. As a result, 99.79% accuracy was achieved for the training dataset, and 91.38% accuracy was obtained for the testing dataset. AptaNet achieved high performance on our constructed aptamer-protein benchmark dataset. The results indicate that AptaNet can help identify novel aptamer-protein interacting pairs and build more-efficient insights into the relationship between aptamers and proteins. Our benchmark dataset and the source codes for AptaNet are available in: https://github.com/nedaemami/AptaNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
13
|
Torkamanian-Afshar M, Nematzadeh S, Tabarzad M, Najafi A, Lanjanian H, Masoudi-Nejad A. In silico design of novel aptamers utilizing a hybrid method of machine learning and genetic algorithm. Mol Divers 2021; 25:1395-1407. [PMID: 33554306 DOI: 10.1007/s11030-021-10192-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 01/28/2021] [Indexed: 12/29/2022]
Abstract
Aptamers can be regarded as efficient substitutes for monoclonal antibodies in many diagnostic and therapeutic applications. Due to the tedious and prohibitive nature of SELEX (systematic evolution of ligands by exponential enrichment), the in silico methods have been developed to improve the enrichment processes rate. However, the majority of these methods did not show any effort in designing novel aptamers. Moreover, some target proteins may have not any binding RNA candidates in nature and a reductive mechanism is needed to generate novel aptamer pools among enormous possible combinations of nucleotide acids to be examined in vitro. We have applied a genetic algorithm (GA) with an embedded binding predictor fitness function to in silico design of RNA aptamers. As a case study of this research, all steps were accomplished to generate an aptamer pool against aminopeptidase N (CD13) biomarker. First, the model was developed based on sequential and structural features of known RNA-protein complexes. Then, utilizing RNA sequences involved in complexes with positive prediction results, as the first-generation, novel aptamers were designed and top-ranked sequences were selected. A 76-mer aptamer was identified with the highest fitness value with a 3 to 6 time higher score than parent oligonucleotides. The reliability of obtained sequences was confirmed utilizing docking and molecular dynamic simulation. The proposed method provides an important simplified contribution to the oligonucleotide-aptamer design process. Also, it can be an underlying ground to design novel aptamers against a wide range of biomarkers.
Collapse
Affiliation(s)
- Mahsa Torkamanian-Afshar
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish Island, Iran.,Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.,Department of Computer Technologies, Beykent University, Istanbul, Turkey
| | - Sajjad Nematzadeh
- Department of Computer Technologies, Beykent University, Istanbul, Turkey
| | - Maryam Tabarzad
- Protein Technology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Tehran, Iran
| | - Hossein Lanjanian
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Masoudi-Nejad
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish Island, Iran. .,Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
14
|
Saito S. SELEX-based DNA Aptamer Selection: A Perspective from the Advancement of Separation Techniques. ANAL SCI 2021; 37:17-26. [PMID: 33132238 DOI: 10.2116/analsci.20sar18] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 10/22/2020] [Indexed: 11/23/2022]
Abstract
DNA aptamers, which are short, single-stranded DNA sequences that selectively bind to target substances (proteins, cells, small molecules, metal ions), can be acquired by means of the systematic evolution of ligands by exponential enrichment (SELEX) methodology. In the SELEX procedure, one of the keys for the effective acquisition of high-affinity and functional aptamer sequences is the separation stage to isolate target-bound DNA from unbound DNA in a randomized DNA library. In this review, various remarkable advancements in separation techniques for SELEX-based aptamer selection developed in this decade, are described and discussed, including CE-, microfluidic chip-, solid phase-, and FACS-based SELEX along with other methods.
Collapse
Affiliation(s)
- Shingo Saito
- Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo Sakura, Saitama, 338-8570, Japan.
| |
Collapse
|
15
|
Lee W, Han K. Constructive Prediction of Potential RNA Aptamers for a Protein Target. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1476-1482. [PMID: 31689200 DOI: 10.1109/tcbb.2019.2951114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Aptamers are short single-stranded nucleic acids that bind to target molecules with high affinity and selectivity. Aptamers are generally identified in vitro by performing SELEX (systematic evolution of ligands by exponential enrichment). Complementing the SELEX process, several computational methods have been proposed in the search for aptamers. However, many of these methods cannot be applied for finding new aptamers, either because they are classifiers for determining whether an RNA and protein interact with each other, or because they are limited to a specific target only. Hence, we developed a new random forest (RF) model for finding potential RNA aptamers for a protein target. From an extensive analysis of protein-RNA complexes including RNA aptamers-protein complexes, we identified key features of interacting RNA and protein molecules, and structural constraints on RNA aptamers. The potential RNA aptamers predicted by our method reveal similar secondary and protein-binding structures as the actual RNA aptamers. The RF model showed a reliable performance in both cross validations and independent testing. The key features of interacting RNA and protein molecules and the structural constraints identified in our study were effective in finding potential aptamers for a protein target. Although preliminary, our results are promising, and we believe this approach will be useful in reducing time and money spent on in vitro experiments by substantially limiting the size of the initial pool of nucleic acid sequences.
Collapse
|
16
|
Volk MJ, Lourentzou I, Mishra S, Vo LT, Zhai C, Zhao H. Biosystems Design by Machine Learning. ACS Synth Biol 2020; 9:1514-1533. [PMID: 32485108 DOI: 10.1021/acssynbio.0c00129] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Biosystems such as enzymes, pathways, and whole cells have been increasingly explored for biotechnological applications. However, the intricate connectivity and resulting complexity of biosystems poses a major hurdle in designing biosystems with desirable features. As -omics and other high throughput technologies have been rapidly developed, the promise of applying machine learning (ML) techniques in biosystems design has started to become a reality. ML models enable the identification of patterns within complicated biological data across multiple scales of analysis and can augment biosystems design applications by predicting new candidates for optimized performance. ML is being used at every stage of biosystems design to help find nonobvious engineering solutions with fewer design iterations. In this review, we first describe commonly used models and modeling paradigms within ML. We then discuss some applications of these models that have already shown success in biotechnological applications. Moreover, we discuss successful applications at all scales of biosystems design, including nucleic acids, genetic circuits, proteins, pathways, genomes, and bioprocesses. Finally, we discuss some limitations of these methods and potential solutions as well as prospects of the combination of ML and biosystems design.
Collapse
|
17
|
Li J, Ma X, Li X, Gu J. PPAI: a web server for predicting protein-aptamer interactions. BMC Bioinformatics 2020; 21:236. [PMID: 32517696 PMCID: PMC7285591 DOI: 10.1186/s12859-020-03574-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 05/28/2020] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND The interactions between proteins and aptamers are prevalent in organisms and play an important role in various life activities. Thanks to the rapid accumulation of protein-aptamer interaction data, it is necessary and feasible to construct an accurate and effective computational model to predict aptamers binding to certain interested proteins and protein-aptamer interactions, which is beneficial for understanding mechanisms of protein-aptamer interactions and improving aptamer-based therapies. RESULTS In this study, a novel web server named PPAI is developed to predict aptamers and protein-aptamer interactions with key sequence features of proteins/aptamers and a machine learning framework integrated adaboost and random forest. A new method for extracting several key sequence features of both proteins and aptamers is presented, where the features for proteins are extracted from amino acid composition, pseudo-amino acid composition, grouped amino acid composition, C/T/D composition and sequence-order-coupling number, while the features for aptamers are extracted from nucleotide composition, pseudo-nucleotide composition (PseKNC) and normalized Moreau-Broto autocorrelation coefficient. On the basis of these feature sets and balanced the samples with SMOTE algorithm, we validate the performance of PPAI by the independent test set. The results demonstrate that the Area Under Curve (AUC) is 0.907 for prediction of aptamer, while the AUC reaches 0.871 for prediction of protein-aptamer interactions. CONCLUSION These results indicate that PPAI can query aptamers and proteins, predict aptamers and predict protein-aptamer interactions in batch mode precisely and efficiently, which would be a novel bioinformatics tool for the research of protein-aptamer interactions. PPAI web-server is freely available at http://39.96.85.9/PPAI.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China. .,Tianjin Key Laboratory of Bioelectromagnetic Technology and Intelligent Health, Hebei University of Technology, Tianjin, China.
| | - Xiaoyu Ma
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Xichuan Li
- Tianjin Key Laboratory of Animal and Plant Resistance, College of Life Sciences, Tianjin Normal University, Tianjin, China
| | - Junhua Gu
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| |
Collapse
|
18
|
Wekesa JS, Meng J, Luan Y. Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction. Genomics 2020; 112:2928-2936. [PMID: 32437848 DOI: 10.1016/j.ygeno.2020.05.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/22/2020] [Accepted: 05/05/2020] [Indexed: 12/28/2022]
Abstract
Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA-protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.
Collapse
Affiliation(s)
- Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China; School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000-00200, Kenya
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning 116023, China
| |
Collapse
|
19
|
Emami N, Pakchin PS, Ferdousi R. Computational predictive approaches for interaction and structure of aptamers. J Theor Biol 2020; 497:110268. [PMID: 32311376 DOI: 10.1016/j.jtbi.2020.110268] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 03/27/2020] [Accepted: 04/02/2020] [Indexed: 02/07/2023]
Abstract
Aptamers are short single-strand sequences that can bind to their specific targets with high affinity and specificity. Usually, aptamers are selected experimentally via systematic evolution of ligands by exponential enrichment (SELEX), an evolutionary process that consists of multiple cycles of selection and amplification. The SELEX process is expensive, time-consuming, and its success rates are relatively low. To overcome these difficulties, in recent years, several computational techniques have been developed in aptamer sciences that bring together different disciplines and branches of technologies. In this paper, a complementary review on computational predictive approaches of the aptamer has been organized. Generally, the computational prediction approaches of aptamer have been proposed to carry out in two main categories: interaction-based prediction and structure-based predictions. Furthermore, the available software packages and toolkits in this scope were reviewed. The aim of describing computational methods and tools in aptamer science is that aptamer scientists might take advantage of these computational techniques to develop more accurate and more sensitive aptamers.
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Parvin Samadi Pakchin
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran; Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
20
|
Yang Q, Jia C, Li T. Prediction of aptamer-protein interacting pairs based on sparse autoencoder feature extraction and an ensemble classifier. Math Biosci 2019; 311:103-108. [PMID: 30880100 DOI: 10.1016/j.mbs.2019.01.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 01/29/2019] [Accepted: 01/29/2019] [Indexed: 10/27/2022]
Abstract
Aptamer-protein interacting pairs play important roles in physiological functions and structural characterization. Identifying aptamer-protein interacting pairs is challenging and limited, despite of the tremendous applications of aptamers. Therefore, it is vital to construct a high prediction performance model for identifying aptamer-target interacting pairs. In this study, a novel ensemble method is presented to predict aptamer-protein interacting pairs by integrating sequence characteristics derived from aptamers and the target proteins. The features extracted for aptamers were the compositions of amino acids and pseudo K-tuple nucleotides. In addition, a sparse autoencoder was used to characterize features for the target protein sequences. To remove redundant features, gradient boosting decision tree (GBDT) and incremental feature selection (IFS) methods were used to obtain the optimum combination of sequence characters. Based on 616 selected features, an ensemble of three sub- support vector machine (SVM) classifiers was used to construct our prediction model. Evaluated on an independent dataset, our predictor obtained an accuracy of 75.7%, Matthew's Correlation Coefficient of 0.478, and Youden's Index of 0.538, which were superior to the values reached using other existing predictors. The results show that our model can be used to distinguishing novel aptamer-protein interacting pairs and revealing the interrelation between aptamers and proteins.
Collapse
Affiliation(s)
- Qing Yang
- Institute of Environmental Systems Biology, College of Environmental and Engineering, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Taoying Li
- Department of Maritime Economics and Management, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.
| |
Collapse
|
21
|
Wang L, Yan X, Liu ML, Song KJ, Sun XF, Pan WW. Prediction of RNA-protein interactions by combining deep convolutional neural network with feature selection ensemble method. J Theor Biol 2018; 461:230-238. [PMID: 30321541 DOI: 10.1016/j.jtbi.2018.10.029] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/22/2018] [Accepted: 10/11/2018] [Indexed: 01/01/2023]
Abstract
RNA-protein interaction (RPI) plays an important role in the basic cellular processes of organisms. Unfortunately, due to time and cost constraints, it is difficult for biological experiments to determine the relationship between RNA and protein to a large extent. So there is an urgent need for reliable computational methods to quickly and accurately predict RNA-protein interaction. In this study, we propose a novel computational method RPIFSE (predicting RPI with Feature Selection Ensemble method) based on RNA and protein sequence information to predict RPI. Firstly, RPIFSE disturbs the features extracted by the convolution neural network (CNN) and generates multiple data sets according to the weight of the feature, and then use extreme learning machine (ELM) classifier to classify these data sets. Finally, the results of each classifier are combined, and the highest score is chosen as the final prediction result by weighting voting method. In 5-fold cross-validation experiments, RPIFSE achieved 91.87%, 89.74%, 97.76% and 98.98% accuracy on RPI369, RPI2241, RPI488 and RPI1807 data sets, respectively. To further evaluate the performance of RPIFSE, we compare it with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on those data sets. Furthermore, we also predicted the RPI on the independent data set NPInter2.0 and drew the network graph based on the prediction results. These promising comparison results demonstrated the effectiveness of RPIFSE and indicated that RPIFSE could be a useful tool for predicting RPI.
Collapse
Affiliation(s)
- Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| | - Meng-Lin Liu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Ke-Jian Song
- School of Information Engineering, JiangXi University of Science and Technology, Ganzhou, Jiangxi 341000, China.
| | - Xiao-Fei Sun
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| | - Wen-Wen Pan
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| |
Collapse
|
22
|
A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers. Genes (Basel) 2018; 9:genes9080394. [PMID: 30071697 PMCID: PMC6116045 DOI: 10.3390/genes9080394] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Revised: 07/24/2018] [Accepted: 07/24/2018] [Indexed: 11/29/2022] Open
Abstract
Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances.
Collapse
|
23
|
Wang S, Wang D, Li J, Huang T, Cai YD. Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods. Mol Omics 2018; 14:64-73. [DOI: 10.1039/c7mo00030h] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Several machine learning algorithms were adopted to investigate cleavage sites in a signal peptide. An optimal dagging based classifier was constructed and 870 important features were deemed to be important for this classifier.
Collapse
Affiliation(s)
- ShaoPeng Wang
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| | - Deling Wang
- Department of Medical Imaging
- Sun Yat-sen University Cancer Center
- State Key Laboratory of Oncology in South China
- Collaborative Innovation Center for Cancer Medicine
- Guangzhou
| | - JiaRui Li
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| | - Tao Huang
- Institute of Health Sciences
- Shanghai Institutes for Biological Sciences
- Chinese Academy of Sciences
- Shanghai 200031
- People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| |
Collapse
|
24
|
Sharma TK, Bruno JG, Dhiman A. ABCs of DNA aptamer and related assay development. Biotechnol Adv 2017; 35:275-301. [PMID: 28108354 DOI: 10.1016/j.biotechadv.2017.01.003] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Revised: 11/19/2016] [Accepted: 01/17/2017] [Indexed: 12/14/2022]
Abstract
This review is intended to guide the novice in aptamer research and development to understand virtually all of the aptamer development options and currently available assay modalities. Aptamer development topics range from discussions of basic and advanced versions of Systematic Evolution of Ligands by EXponential Enrichment (SELEX) and SELEX variations involving incorporation of exotic unnatural nucleotides to expand library diversity for even greater aptamer affinity and specificity to improved next generation methods of DNA sequencing, screening and tracking aptamer development throughout the SELEX process and characterization of lead aptamer candidates. Aptamer assay development topics include descriptions of various colorimetric and fluorescent assays in microplates or on membranes including homogeneous beacon and multiplexed Fluorescence Resonance Energy Transfer (FRET) assays. Finally, a discussion of the potential for marketing successful aptamer-based assays or test kits is included.
Collapse
Affiliation(s)
- Tarun Kumar Sharma
- Center for Biodesign and Diagnostics, Translational Health Science and Technology Institute, Faridabad, Haryana 121001, India; AptaBharat Innovation Private Limited, Translational Health Science and Technology Institute Incubator, Haryana 121001, India.
| | - John G Bruno
- Operational Technologies Corporation, 4100 NW Loop 410, Suite, 230, San Antonio, TX 78229, USA..
| | - Abhijeet Dhiman
- Department of Biotechnology, All India Institute of Medical Sciences, New Delhi 110029, India.; Faculty of Pharmacy, Uttarakhand Technical University, Dehradun 248007, Uttarakhand, India
| |
Collapse
|