1
|
Nawaz MS, Fournier-Viger P, Nawaz S, Zhu H, Yun U. SPM4GAC: SPM based approach for genome analysis and classification of macromolecules. Int J Biol Macromol 2024; 266:130984. [PMID: 38513910 DOI: 10.1016/j.ijbiomac.2024.130984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/16/2024] [Indexed: 03/23/2024]
Abstract
Genome sequence analysis and classification play critical roles in properly understanding an organism's main characteristics, functionalities, and changing (evolving) nature. However, the rapid expansion of genomic data makes genome sequence analysis and classification a challenging task due to the high computational requirements, proper management, and understanding of genomic data. Recently proposed models yielded promising results for the task of genome sequence classification. Nevertheless, these models often ignore the sequential nature of nucleotides, which is crucial for revealing their underlying structure and function. To address this limitation, we present SPM4GAC, a sequential pattern mining (SPM)-based framework to analyze and classify the macromolecule genome sequences of viruses. First, a large dataset containing the genome sequences of various RNA viruses is developed and transformed into a suitable format. On the transformed dataset, algorithms for SPM are used to identify frequent sequential patterns of nucleotide bases. The obtained frequent sequential patterns of bases are then used as features to classify different viruses. Ten classifiers are employed, and their performance is assessed by using several evaluation measures. Finally, a performance comparison of SPM4GAC with state-of-the-art methods for genome sequence classification/detection reveals that SPM4GAC performs better than those methods.
Collapse
Affiliation(s)
- M Saqib Nawaz
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
| | | | - Shoaib Nawaz
- Department of Pharmacy, The University of Lahore, Sargodha Campus, Pakistan.
| | - Haowei Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
| | - Unil Yun
- Sejong University, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Zhang Y, Yu L, Yang M, Han B, Luo J, Jing R. Model fusion for predicting unconventional proteins secreted by exosomes using deep learning. Proteomics 2024:e2300184. [PMID: 38643383 DOI: 10.1002/pmic.202300184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/22/2024]
Abstract
Unconventional secretory proteins (USPs) are vital for cell-to-cell communication and are necessary for proper physiological processes. Unlike classical proteins that follow the conventional secretory pathway via the Golgi apparatus, these proteins are released using unconventional pathways. The primary modes of secretion for USPs are exosomes and ectosomes, which originate from the endoplasmic reticulum. Accurate and rapid identification of exosome-mediated secretory proteins is crucial for gaining valuable insights into the regulation of non-classical protein secretion and intercellular communication, as well as for the advancement of novel therapeutic approaches. Although computational methods based on amino acid sequence prediction exist for predicting unconventional proteins secreted by exosomes (UPSEs), they suffer from significant limitations in terms of algorithmic accuracy. In this study, we propose a novel approach to predict UPSEs by combining multiple deep learning models that incorporate both protein sequences and evolutionary information. Our approach utilizes a convolutional neural network (CNN) to extract protein sequence information, while various densely connected neural networks (DNNs) are employed to capture evolutionary conservation patterns.By combining six distinct deep learning models, we have created a superior framework that surpasses previous approaches, achieving an ACC score of 77.46% and an MCC score of 0.5406 on an independent test dataset.
Collapse
Affiliation(s)
- Yonglin Zhang
- Department of Clinical Pharmacy and Pharmacy Management, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China
| | - Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, Guizhou, China
| | - Ming Yang
- Department of Clinical Pharmacy and Pharmacy Management, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China
| | - Bin Han
- GCP Center/Institute of Drug Clinical Trials, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou, Sichuan, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
3
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
4
|
Yi T, Luo J, Liao R, Wang L, Wu A, Li Y, Zhou L, Ni C, Wang K, Tang X, Zou W, Wu J. An Innovative Inducer of Platelet Production, Isochlorogenic Acid A, Is Uncovered through the Application of Deep Neural Networks. Biomolecules 2024; 14:267. [PMID: 38540688 PMCID: PMC10968240 DOI: 10.3390/biom14030267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/16/2024] [Accepted: 02/20/2024] [Indexed: 06/28/2024] Open
Abstract
(1) Background: Radiation-induced thrombocytopenia (RIT) often occurs in cancer patients undergoing radiation therapy, which can result in morbidity and even death. However, a notable deficiency exists in the availability of specific drugs designed for the treatment of RIT. (2) Methods: In our pursuit of new drugs for RIT treatment, we employed three deep learning (DL) algorithms: convolutional neural network (CNN), deep neural network (DNN), and a hybrid neural network that combines the computational characteristics of the two. These algorithms construct computational models that can screen compounds for drug activity by utilizing the distinct physicochemical properties of the molecules. The best model underwent testing using a set of 10 drugs endorsed by the US Food and Drug Administration (FDA) specifically for the treatment of thrombocytopenia. (3) Results: The Hybrid CNN+DNN (HCD) model demonstrated the most effective predictive performance on the test dataset, achieving an accuracy of 98.3% and a precision of 97.0%. Both metrics surpassed the performance of the other models, and the model predicted that seven FDA drugs would exhibit activity. Isochlorogenic acid A, identified through screening the Chinese Pharmacopoeia Natural Product Library, was subsequently subjected to experimental verification. The results indicated a substantial enhancement in the differentiation and maturation of megakaryocytes (MKs), along with a notable increase in platelet production. (4) Conclusions: This underscores the potential therapeutic efficacy of isochlorogenic acid A in addressing RIT.
Collapse
Affiliation(s)
- Taian Yi
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (T.Y.); (Y.L.)
| | - Jiesi Luo
- Department of Chemistry, School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China;
| | - Ruixue Liao
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China (L.W.); (A.W.); (L.Z.); (C.N.); (K.W.); (X.T.)
| | - Long Wang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China (L.W.); (A.W.); (L.Z.); (C.N.); (K.W.); (X.T.)
| | - Anguo Wu
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China (L.W.); (A.W.); (L.Z.); (C.N.); (K.W.); (X.T.)
| | - Yueyue Li
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (T.Y.); (Y.L.)
| | - Ling Zhou
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China (L.W.); (A.W.); (L.Z.); (C.N.); (K.W.); (X.T.)
| | - Chengyang Ni
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China (L.W.); (A.W.); (L.Z.); (C.N.); (K.W.); (X.T.)
| | - Kai Wang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China (L.W.); (A.W.); (L.Z.); (C.N.); (K.W.); (X.T.)
| | - Xiaoqin Tang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China (L.W.); (A.W.); (L.Z.); (C.N.); (K.W.); (X.T.)
| | - Wenjun Zou
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (T.Y.); (Y.L.)
| | - Jianming Wu
- Department of Chemistry, School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China;
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China (L.W.); (A.W.); (L.Z.); (C.N.); (K.W.); (X.T.)
- The Institute of Cardiovascular Research, Key Laboratory of Medical Electrophysiology of Ministry of Education, Luzhou 646000, China
| |
Collapse
|
5
|
Avila Santos AP, de Almeida BLS, Bonidia RP, Stadler PF, Stefanic P, Mandic-Mulec I, Rocha U, Sanches DS, de Carvalho ACPLF. BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification. RNA Biol 2024; 21:1-12. [PMID: 38528797 DOI: 10.1080/15476286.2024.2329451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2024] [Indexed: 03/27/2024] Open
Abstract
The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.
Collapse
Affiliation(s)
- Anderson P Avila Santos
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
- Department of Applied Microbial Ecology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony, Germany
| | - Breno L S de Almeida
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
| | - Robson P Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Brazil
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony, Germany
| | - Polonca Stefanic
- Department of Food Science and Technology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Ines Mandic-Mulec
- Department of Food Science and Technology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Ulisses Rocha
- Department of Applied Microbial Ecology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony, Germany
| | - Danilo S Sanches
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Brazil
| | | |
Collapse
|
6
|
Yu L, Zhang Y, Xue L, Liu F, Jing R, Luo J. EnsembleDL-ATG: Identifying autophagy proteins by integrating their sequence and evolutionary information using an ensemble deep learning framework. Comput Struct Biotechnol J 2023; 21:4836-4848. [PMID: 37854634 PMCID: PMC10579870 DOI: 10.1016/j.csbj.2023.09.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/20/2023] Open
Abstract
Autophagy is a primary mechanism for maintaining cellular homeostasis. The synergistic actions of autophagy-related (ATG) proteins strictly regulate the whole autophagic process. Therefore, accurate identification of ATGs is a first and critical step to reveal the molecular mechanism underlying the regulation of autophagy. Current computational methods can predict ATGs from primary protein sequences, but owing to the limitations of algorithms, significant room for improvement still exists. In this research, we propose EnsembleDL-ATG, an ensemble deep learning framework that aggregates multiple deep learning models to predict ATGs from protein sequence and evolutionary information. We first evaluated the performance of individual networks for various feature descriptors to identify the most promising models. Then, we explored all possible combinations of independent models to select the most effective ensemble architecture. The final framework was built and maintained by an organization of four different deep learning models. Experimental results show that our proposed method achieves a prediction accuracy of 94.5 % and MCC of 0.890, which are nearly 4 % and 0.08 higher than ATGPred-FL, respectively. Overall, EnsembleDL-ATG is the first ATG machine learning predictor based on ensemble deep learning. The benchmark data and code utilized in this study can be accessed for free at https://github.com/jingry/autoBioSeqpy/tree/2.0/examples/EnsembleDL-ATG.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, Guizhou, China
- Basic Medical College, Southwest Medical University, Luzhou 646000, Sichuan, China
| | - Yonglin Zhang
- Department of Pharmacy, The Affiliated Hospital of North Sichuan Medical College, Nanchong 637000, Sichuan, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou 646000, Sichuan, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, Guizhou, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, Sichuan, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou 646000, Sichuan, China
- Sichuan Key Medical Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou Key Laboratory of Activity Screening and Druggability Evaluation for Chinese Materia Medica, Southwest Medical University, Luzhou 646000, Sichuan, China
| |
Collapse
|
7
|
Zhang Y, Yu L, Jing R, Han B, Luo J. Fast and Efficient Design of Deep Neural Networks for Predicting N 7-Methylguanosine Sites Using autoBioSeqpy. ACS OMEGA 2023; 8:19728-19740. [PMID: 37305295 PMCID: PMC10249100 DOI: 10.1021/acsomega.3c01371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/10/2023] [Indexed: 06/13/2023]
Abstract
N7-Methylguanosine (m7G) is a crucial post-transcriptional RNA modification that plays a pivotal role in regulating gene expression. Accurately identifying m7G sites is a fundamental step in understanding the biological functions and regulatory mechanisms associated with this modification. While whole-genome sequencing is the gold standard for RNA modification site detection, it is a time-consuming, expensive, and intricate process. Recently, computational approaches, especially deep learning (DL) techniques, have gained popularity in achieving this objective. Convolutional neural networks and recurrent neural networks are examples of DL algorithms that have emerged as versatile tools for modeling biological sequence data. However, developing an efficient network architecture with superior performance remains a challenging task, requiring significant expertise, time, and effort. To address this, we previously introduced a tool called autoBioSeqpy, which streamlines the design and implementation of DL networks for biological sequence classification. In this study, we utilized autoBioSeqpy to develop, train, evaluate, and fine-tune sequence-level DL models for predicting m7G sites. We provided detailed descriptions of these models, along with a step-by-step guide on their execution. The same methodology can be applied to other systems dealing with similar biological questions. The benchmark data and code utilized in this study can be accessed for free at http://github.com/jingry/autoBioSeeqpy/tree/2.0/examples/m7G.
Collapse
Affiliation(s)
- Yonglin Zhang
- Department
of Pharmacy, Affiliated Hospital of North
Sichuan Medical College, Nanchong 637000, China
| | - Lezheng Yu
- School
of Chemistry and Materials Science, Guizhou
Education University, Guiyang 550024, China
| | - Runyu Jing
- School
of Cyber Science and Engineering, Sichuan
University, Chengdu 610017, China
| | - Bin Han
- GCP
Center/Institute of Drug Clinical Trials, Affiliated Hospital of North Sichuan Medical College, Nanchong 637503, China
| | - Jiesi Luo
- Basic
Medical College, Southwest Medical University, Luzhou 646099, Sichuan, China
- Key
Medical
Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou
Key Laboratory of Activity Screening and Druggability Evaluation for
Chinese Materia Medica, Southwest Medical
University, Luzhou 646099, China
| |
Collapse
|
8
|
Yu L, Zhang Y, Xue L, Liu F, Jing R, Luo J. Evaluation and development of deep neural networks for RNA 5-Methyluridine classifications using autoBioSeqpy. Front Microbiol 2023; 14:1175925. [PMID: 37275146 PMCID: PMC10232852 DOI: 10.3389/fmicb.2023.1175925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 04/27/2023] [Indexed: 06/07/2023] Open
Abstract
Post-transcriptionally RNA modifications, also known as the epitranscriptome, play crucial roles in the regulation of gene expression during development. Recently, deep learning (DL) has been employed for RNA modification site prediction and has shown promising results. However, due to the lack of relevant studies, it is unclear which DL architecture is best suited for some pyrimidine modifications, such as 5-methyluridine (m5U). To fill this knowledge gap, we first performed a comparative evaluation of various commonly used DL models for epigenetic studies with the help of autoBioSeqpy. We identified optimal architectural variations for m5U site classification, optimizing the layer depth and neuron width. Second, we used this knowledge to develop Deepm5U, an improved convolutional-recurrent neural network that accurately predicts m5U sites from RNA sequences. We successfully applied Deepm5U to transcriptomewide m5U profiling data across different sequencing technologies and cell types. Third, we showed that the techniques for interpreting deep neural networks, including LayerUMAP and DeepSHAP, can provide important insights into the internal operation and behavior of models. Overall, we offered practical guidance for the development, benchmark, and analysis of deep learning models when designing new algorithms for RNA modifications.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Yonglin Zhang
- Department of Pharmacy, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou, China
- Sichuan Key Medical Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou Key Laboratory of Activity Screening and Druggability Evaluation for Chinese Materia Medica, Southwest Medical University, Luzhou, China
| |
Collapse
|
9
|
Mo Q, Zhang T, Wu J, Wang L, Luo J. Identification of thrombopoiesis inducer based on a hybrid deep neural network model. Thromb Res 2023; 226:36-50. [PMID: 37119555 DOI: 10.1016/j.thromres.2023.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 03/13/2023] [Accepted: 04/11/2023] [Indexed: 05/01/2023]
Abstract
Thrombocytopenia is a common haematological problem worldwide. Currently, there are no relatively safe and effective agents for the treatment of thrombocytopenia. To address this challenge, we propose a computational method that enables the discovery of novel drug candidates with haematopoietic activities. Based on different types of molecular representations, three deep learning (DL) algorithms, namely recurrent neural networks (RNNs), deep neural networks (DNNs), and hybrid neural networks (RNNs+DNNs), were used to develop classification models to distinguish between active and inactive compounds. The evaluation results illustrated that the hybrid DL model exhibited the best prediction performance, with an accuracy of 97.8 % and Matthews correlation coefficient of 0.958 on the test dataset. Subsequently, we performed drug discovery screening based on the hybrid DL model and identified a compound from the FDA-approved drug library that was structurally divergent from conventional drugs and showed a potential therapeutic action against thrombocytopenia. The novel drug candidate wedelolactone significantly promoted megakaryocyte differentiation in vitro and increased platelet levels and megakaryocyte differentiation in irradiated mice with no systemic toxicity. Overall, our work demonstrates how artificial intelligence can be used to discover novel drugs against thrombocytopenia.
Collapse
Affiliation(s)
- Qi Mo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Ting Zhang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Jianming Wu
- Basic Medical College, Southwest Medical University, Luzhou 646000, China.
| | - Long Wang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China.
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou 646000, China; State Key Laboratory of Southwestern Chinese Medicine Resources, Chengdu University of Traditional Chinese Medicine, Chengdu 610075, China.
| |
Collapse
|
10
|
Liao M, Zhao JP, Tian J, Zheng CH. iEnhancer-DCLA: using the original sequence to identify enhancers and their strength based on a deep learning framework. BMC Bioinformatics 2022; 23:480. [PMCID: PMC9664816 DOI: 10.1186/s12859-022-05033-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 11/02/2022] [Indexed: 11/16/2022] Open
Abstract
AbstractEnhancers are small regions of DNA that bind to proteins, which enhance the transcription of genes. The enhancer may be located upstream or downstream of the gene. It is not necessarily close to the gene to be acted on, because the entanglement structure of chromatin allows the positions far apart in the sequence to have the opportunity to contact each other. Therefore, identifying enhancers and their strength is a complex and challenging task. In this article, a new prediction method based on deep learning is proposed to identify enhancers and enhancer strength, called iEnhancer-DCLA. Firstly, we use word2vec to convert k-mers into number vectors to construct an input matrix. Secondly, we use convolutional neural network and bidirectional long short-term memory network to extract sequence features, and finally use the attention mechanism to extract relatively important features. In the task of predicting enhancers and their strengths, this method has improved to a certain extent in most evaluation indexes. In summary, we believe that this method provides new ideas in the analysis of enhancers.
Collapse
|
11
|
DeeProPre: A promoter predictor based on deep learning. Comput Biol Chem 2022; 101:107770. [PMID: 36116322 DOI: 10.1016/j.compbiolchem.2022.107770] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 08/06/2022] [Accepted: 09/11/2022] [Indexed: 11/21/2022]
Abstract
The promoter is a DNA sequence recognized, bound and transcribed by RNA polymerase. It is usually located at the upstream or 5'end of the transcription start site (TSS). Studies have shown that the structure of the promoter affects its affinity for RNA polymerase, thus affecting the level of gene expression. Therefore, the correct identification of core promoter and common structural gene is of great significance in the field of biomedicine. At present, many methods have been proposed to improve the accuracy of promoter recognition, but the performances still need to be further improved. In this study, a deep learning algorithm (DeeProPre) based on bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN) was proposed. Firstly, the supervised embedding layer was applied to map the sequence to a high-dimensional space. Secondly, two 1D convolutional layers, BiLSTM and attentional mechanism layer were used for extracting features. Finally, the full connection layer activated by Sigmoid function was used to obtain the probability of classification into target categories. This model can identify the promoter region of eukaryotes with high accuracy, providing an analytical basis for further understanding of promoter physiological functions and studies of gene transcription mechanisms. The source code of DeeProPre is freely available at https://github.com/zzwwmmm/DeeProPre/tree/master.
Collapse
|
12
|
Bonidia RP, Santos APA, de Almeida BLS, Stadler PF, da Rocha UN, Sanches DS, de Carvalho ACPLF. BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria. Brief Bioinform 2022; 23:6618238. [PMID: 35753697 PMCID: PMC9294424 DOI: 10.1093/bib/bbac218] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 05/06/2022] [Accepted: 05/09/2022] [Indexed: 01/19/2023] Open
Abstract
Recent technological advances have led to an exponential expansion of biological sequence data and extraction of meaningful information through Machine Learning (ML) algorithms. This knowledge has improved the understanding of mechanisms related to several fatal diseases, e.g. Cancer and coronavirus disease 2019, helping to develop innovative solutions, such as CRISPR-based gene editing, coronavirus vaccine and precision medicine. These advances benefit our society and economy, directly impacting people’s lives in various areas, such as health care, drug discovery, forensic analysis and food processing. Nevertheless, ML-based approaches to biological data require representative, quantitative and informative features. Many ML algorithms can handle only numerical data, and therefore sequences need to be translated into a numerical feature vector. This process, known as feature extraction, is a fundamental step for developing high-quality ML-based models in bioinformatics, by allowing the feature engineering stage, with design and selection of suitable features. Feature engineering, ML algorithm selection and hyperparameter tuning are often manual and time-consuming processes, requiring extensive domain knowledge. To deal with this problem, we present a new package: BioAutoML. BioAutoML automatically runs an end-to-end ML pipeline, extracting numerical and informative features from biological sequence databases, using the MathFeature package, and automating the feature selection, ML algorithm(s) recommendation and tuning of the selected algorithm(s) hyperparameters, using Automated ML (AutoML). BioAutoML has two components, divided into four modules: (1) automated feature engineering (feature extraction and selection modules) and (2) Metalearning (algorithm recommendation and hyper-parameter tuning modules). We experimentally evaluate BioAutoML in two different scenarios: (i) prediction of the three main classes of noncoding RNAs (ncRNAs) and (ii) prediction of the eight categories of ncRNAs in bacteria, including housekeeping and regulatory types. To assess BioAutoML predictive performance, it is experimentally compared with two other AutoML tools (RECIPE and TPOT). According to the experimental results, BioAutoML can accelerate new studies, reducing the cost of feature engineering processing and either keeping or improving predictive performance. BioAutoML is freely available at https://github.com/Bonidia/BioAutoML.
Collapse
Affiliation(s)
- Robson P Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Anderson P Avila Santos
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil.,Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Saxony, Germany
| | - Breno L S de Almeida
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony, Germany
| | - Ulisses N da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Saxony, Germany
| | - Danilo S Sanches
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| |
Collapse
|
13
|
Yu L, Zhang Y, Xue L, Liu F, Chen Q, Luo J, Jing R. Systematic Analysis and Accurate Identification of DNA N4-Methylcytosine Sites by Deep Learning. Front Microbiol 2022; 13:843425. [PMID: 35401453 PMCID: PMC8989013 DOI: 10.3389/fmicb.2022.843425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Accepted: 02/21/2022] [Indexed: 11/13/2022] Open
Abstract
DNA N4-methylcytosine (4mC) is a pivotal epigenetic modification that plays an essential role in DNA replication, repair, expression and differentiation. To gain insight into the biological functions of 4mC, it is critical to identify their modification sites in the genomics. Recently, deep learning has become increasingly popular in recent years and frequently employed for the 4mC site identification. However, a systematic analysis of how to build predictive models using deep learning techniques is still lacking. In this work, we first summarized all existing deep learning-based predictors and systematically analyzed their models, features and datasets, etc. Then, using a typical standard dataset with three species (A. thaliana, C. elegans, and D. melanogaster), we assessed the contribution of different model architectures, encoding methods and the attention mechanism in establishing a deep learning-based model for the 4mC site prediction. After a series of optimizations, convolutional-recurrent neural network architecture using the one-hot encoding and attention mechanism achieved the best overall prediction performance. Extensive comparison experiments were conducted based on the same dataset. This work will be helpful for researchers who would like to build the 4mC prediction models using deep learning in the future.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Yonglin Zhang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Qi Chen
- Department of Endocrinology and Metabolism, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China.,Department of Pharmacy, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| |
Collapse
|
14
|
A systematic mapping study on machine learning techniques for the prediction of CRISPR/Cas9 sgRNA target cleavage. Comput Struct Biotechnol J 2022; 20:5813-5823. [PMID: 36382194 PMCID: PMC9630617 DOI: 10.1016/j.csbj.2022.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/21/2022] [Accepted: 10/08/2022] [Indexed: 11/30/2022] Open
Abstract
CRISPR/Cas9 technology has greatly accelerated genome engineering research. The CRISPR/Cas9 complex, a bacterial immune response system, is widely adopted for RNA-driven targeted genome editing. The systematic mapping study presented in this paper examines the literature on machine learning (ML) techniques employed in the prediction of CRISPR/Cas9 sgRNA on/off-target cleavage, focusing on improving support in sgRNA design activities and identifying areas currently being researched. This area of research has greatly expanded recently, and we found it appropriate to work on a Systematic Mapping Study (SMS), an investigation that has proven to be an effective secondary study method. Unlike a classic review, in an SMS, no comparison of methods or results is made, while this task can instead be the subject of a systematic literature review that chooses one theme among those highlighted in this SMS. The study is illustrated in this paper. To the best of the authors' knowledge, no other SMS studies have been published on this topic. Fifty-seven papers published in the period 2017–2022 (April, 30) were analyzed. This study reveals that the most widely used ML model is the convolutional neural network (CNN), followed by the feedforward neural network (FNN), while the use of other models is marginal. Other interesting information has emerged, such as the wide availability of both open code and platforms dedicated to supporting the activity of researchers or the fact that there is a clear prevalence of public funds that finance research on this topic.
Collapse
|
15
|
Yu L, Xue L, Liu F, Li Y, Jing R, Luo J. The applications of deep learning algorithms on in silico druggable proteins identification. J Adv Res 2022; 41:219-231. [PMID: 36328750 PMCID: PMC9637576 DOI: 10.1016/j.jare.2022.01.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/21/2021] [Accepted: 01/18/2022] [Indexed: 11/20/2022] Open
Abstract
We developed the first deep learning-based druggable protein classifier for fast and accurate identification of potential druggable proteins. Experimental results on a standard dataset demonstrate that the prediction performance of deep learning model is comparable to those of existing methods. We visualized the representations of druggable proteins learned by deep learning models, which helps us understand how they work. Our analysis reconfirms that the attention mechanism is especially useful for explaining deep learning models.
Introduction The top priority in drug development is to identify novel and effective drug targets. In vitro assays are frequently used for this purpose; however, traditional experimental approaches are insufficient for large-scale exploration of novel drug targets, as they are expensive, time-consuming and laborious. Therefore, computational methods have emerged in recent decades as an alternative to aid experimental drug discovery studies by developing sophisticated predictive models to estimate unknown drugs/compounds and their targets. The recent success of deep learning (DL) techniques in machine learning and artificial intelligence has further attracted a great deal of attention in the biomedicine field, including computational drug discovery. Objectives This study focuses on the practical applications of deep learning algorithms for predicting druggable proteins and proposes a powerful predictor for fast and accurate identification of potential drug targets. Methods Using a gold-standard dataset, we explored several typical protein features and different deep learning algorithms and evaluated their performance in a comprehensive way. We provide an overview of the entire experimental process, including protein features and descriptors, neural network architectures, libraries and toolkits for deep learning modelling, performance evaluation metrics, model interpretation and visualization. Results Experimental results show that the hybrid model (architecture: CNN-RNN (BiLSTM) + DNN; feature: dictionary encoding + DC_TC_CTD) performed better than the other models on the benchmark dataset. This hybrid model was able to achieve 90.0% accuracy and 0.800 MCC on the test dataset and 84.8% and 0.703 on a nonredundant independent test dataset, which is comparable to those of existing methods. Conclusion We developed the first deep learning-based classifier for fast and accurate identification of potential druggable proteins. We hope that this study will be helpful for future researchers who would like to use deep learning techniques to develop relevant predictive models.
Collapse
|
16
|
Yu L, Liu F, Li Y, Luo J, Jing R. DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors. Front Microbiol 2021; 12:605782. [PMID: 33552038 PMCID: PMC7858263 DOI: 10.3389/fmicb.2021.605782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 01/04/2021] [Indexed: 01/17/2023] Open
Abstract
Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu, China
| |
Collapse
|