1
|
Hamouda E, Tarek M. A hybrid approach of ensemble learning and grey wolf optimizer for DNA splice junction prediction. PLoS One 2024; 19:e0310698. [PMID: 39312561 PMCID: PMC11419377 DOI: 10.1371/journal.pone.0310698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 09/05/2024] [Indexed: 09/25/2024] Open
Abstract
DNA splice junction classification is a crucial job in computational biology. The challenge is to predict the junction type (IE, EI, or N) from a given DNA sequence. Predicting junction type is crucial for understanding gene expression patterns, disease causes, splicing regulation, and gene structure. The location of the regions where exons are joined, and introns are removed during RNA splicing is very difficult to determine because no universal rule guides this process. This study presents a two-layer hybrid approach inspired by ensemble learning to overcome this challenge. The first layer applies the grey wolf optimizer (GWO) for feature selection. GWO's exploration ability allows it to efficiently search a vast feature space, while its exploitation ability refines promising areas, thus leading to a more reliable feature selection. The selected features are then fed into the second layer, which employs a classification model trained on the retrieved features. Using cross-validation, the proposed method divides the DNA splice junction dataset into training and test sets, allowing for a thorough examination of the classifier's generalization ability. The ensemble model is trained on various partitions of the training set and tested on the remaining held-out fold. This process is performed for each fold, comprehensively evaluating the classifier's performance. We tested our method using the StatLog DNA dataset. Compared to various machine learning models for DNA splice junction prediction, the proposed GWO+SVM ensemble method achieved an accuracy of 96%. This finding suggests that the proposed ensemble hybrid approach is promising for DNA splice junction classification. The implementation code for the proposed approach is available at https://github.com/EFHamouda/DNA-splice-junction-prediction.
Collapse
Affiliation(s)
- Eslam Hamouda
- Computer Science Department, Faculty of Computers & Information, Mansoura University, Mansoura, Egypt
- Computer Science Department, Faculty of Computers & Information, Jouf University, Jouf, Saudi Arabi
| | - Mayada Tarek
- Computer Science Department, Faculty of Computers & Information, Mansoura University, Mansoura, Egypt
- Computer Science Department, Faculty of Computers & Information, Jouf University, Jouf, Saudi Arabi
| |
Collapse
|
2
|
Rips J, Halstuk O, Fuchs A, Lang Z, Sido T, Gershon-Naamat S, Abu-Libdeh B, Edvardson S, Salah S, Breuer O, Hadhud M, Eden S, Simon I, Slae M, Damseh NS, Abu-Libdeh A, Eskin-Schwartz M, Birk OS, Varga J, Schueler-Furman O, Rosenbluh C, Elpeleg O, Yanovsky-Dagan S, Mor-Shaked H, Harel T. Unbiased phenotype and genotype matching maximizes gene discovery and diagnostic yield. Genet Med 2024; 26:101068. [PMID: 38193396 DOI: 10.1016/j.gim.2024.101068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 12/31/2023] [Accepted: 01/04/2024] [Indexed: 01/10/2024] Open
Abstract
PURPOSE Widespread application of next-generation sequencing, combined with data exchange platforms, has provided molecular diagnoses for countless families. To maximize diagnostic yield, we implemented an unbiased semi-automated genematching algorithm based on genotype and phenotype matching. METHODS Rare homozygous variants identified in 2 or more affected individuals, but not in healthy individuals, were extracted from our local database of ∼12,000 exomes. Phenotype similarity scores (PSS), based on human phenotype ontology terms, were assigned to each pair of individuals matched at the genotype level using HPOsim. RESULTS 33,792 genotype-matched pairs were discovered, representing variants in 7567 unique genes. There was an enrichment of PSS ≥0.1 among pathogenic/likely pathogenic variant-level pairs (94.3% in pathogenic/likely pathogenic variant-level matches vs 34.75% in all matches). We highlighted founder or region-specific variants as an internal positive control and proceeded to identify candidate disease genes. Variant-level matches were particularly helpful in cases involving inframe indels and splice region variants beyond the canonical splice sites, which may otherwise have been disregarded, allowing for detection of candidate disease genes, such as KAT2A, RPAIN, and LAMP3. CONCLUSION Semi-automated genotype matching combined with PSS is a powerful tool to resolve variants of uncertain significance and to identify candidate disease genes.
Collapse
Affiliation(s)
- Jonathan Rips
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel
| | - Orli Halstuk
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel; Faculty of Medicine, Hebrew University of Jerusalem, Israel
| | - Adina Fuchs
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel; Faculty of Medicine, Hebrew University of Jerusalem, Israel
| | - Ziv Lang
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel
| | - Tal Sido
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel
| | | | - Bassam Abu-Libdeh
- Department of Pediatrics & Genetics, Makassed Hospital & Al-Quds Medical School, E. Jerusalem, Palestine
| | - Simon Edvardson
- Faculty of Medicine, Hebrew University of Jerusalem, Israel; Pediatric Neurology Unit, Hadassah Medical Center, Jerusalem, Israel
| | - Somaya Salah
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel
| | - Oded Breuer
- Faculty of Medicine, Hebrew University of Jerusalem, Israel; Pediatric Pulmonology and CF Unit, Department of Pediatrics, Hadassah Medical Center, Jerusalem, Israel
| | - Mohamad Hadhud
- Faculty of Medicine, Hebrew University of Jerusalem, Israel; Pediatric Pulmonology and CF Unit, Department of Pediatrics, Hadassah Medical Center, Jerusalem, Israel
| | - Sharon Eden
- Institute of Medical Research Israel-Canada, Faculty of Medicine, The Hebrew University, Jerusalem, Israel
| | - Itamar Simon
- Institute of Medical Research Israel-Canada, Faculty of Medicine, The Hebrew University, Jerusalem, Israel
| | - Mordechai Slae
- Pediatric Gastroenterology Unit, Department of Pediatrics, Hadassah Medical Center, Jerusalem, Israel
| | - Nadirah S Damseh
- Department of Pediatrics & Genetics, Makassed Hospital & Al-Quds Medical School, E. Jerusalem, Palestine
| | - Abdulsalam Abu-Libdeh
- Department of Pediatrics & Genetics, Makassed Hospital & Al-Quds Medical School, E. Jerusalem, Palestine; Division of Pediatric Endocrinology, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Marina Eskin-Schwartz
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel; Genetics Institute, Soroka University Medical Center, Beer-Sheva, Israel
| | - Ohad S Birk
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel; Genetics Institute, Soroka University Medical Center, Beer-Sheva, Israel
| | - Julia Varga
- Microbiology and Molecular Genetics, Institute for Biomedical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Ora Schueler-Furman
- Microbiology and Molecular Genetics, Institute for Biomedical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | | | - Orly Elpeleg
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel; Faculty of Medicine, Hebrew University of Jerusalem, Israel
| | | | - Hagar Mor-Shaked
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel; Faculty of Medicine, Hebrew University of Jerusalem, Israel
| | - Tamar Harel
- Department of Genetics, Hadassah Medical Center, Jerusalem, Israel; Faculty of Medicine, Hebrew University of Jerusalem, Israel.
| |
Collapse
|
3
|
Liu X, Zhang H, Zeng Y, Zhu X, Zhu L, Fu J. DRANetSplicer: A Splice Site Prediction Model Based on Deep Residual Attention Networks. Genes (Basel) 2024; 15:404. [PMID: 38674339 PMCID: PMC11048956 DOI: 10.3390/genes15040404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/28/2024] Open
Abstract
The precise identification of splice sites is essential for unraveling the structure and function of genes, constituting a pivotal step in the gene annotation process. In this study, we developed a novel deep learning model, DRANetSplicer, that integrates residual learning and attention mechanisms for enhanced accuracy in capturing the intricate features of splice sites. We constructed multiple datasets using the most recent versions of genomic data from three different organisms, Oryza sativa japonica, Arabidopsis thaliana and Homo sapiens. This approach allows us to train models with a richer set of high-quality data. DRANetSplicer outperformed benchmark methods on donor and acceptor splice site datasets, achieving an average accuracy of (96.57%, 95.82%) across the three organisms. Comparative analyses with benchmark methods, including SpliceFinder, Splice2Deep, Deep Splicer, EnsembleSplice, and DNABERT, revealed DRANetSplicer's superior predictive performance, resulting in at least a (4.2%, 11.6%) relative reduction in average error rate. We utilized the DRANetSplicer model trained on O. sativa japonica data to predict splice sites in A. thaliana, achieving accuracies for donor and acceptor sites of (94.89%, 94.25%). These results indicate that DRANetSplicer possesses excellent cross-organism predictive capabilities, with its performance in cross-organism predictions even surpassing that of benchmark methods in non-cross-organism predictions. Cross-organism validation showcased DRANetSplicer's excellence in predicting splice sites across similar organisms, supporting its applicability in gene annotation for understudied organisms. We employed multiple methods to visualize the decision-making process of the model. The visualization results indicate that DRANetSplicer can learn and interpret well-known biological features, further validating its overall performance. Our study systematically examined and confirmed the predictive ability of DRANetSplicer from various levels and perspectives, indicating that its practical application in gene annotation is justified.
Collapse
Affiliation(s)
- Xueyan Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Hongyan Zhang
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Ying Zeng
- School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, China;
| | - Xinghui Zhu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Lei Zhu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Jiahui Fu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| |
Collapse
|
4
|
Dutta S, Zunjare RU, Sil A, Mishra DC, Arora A, Gain N, Chand G, Chhabra R, Muthusamy V, Hossain F. Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition. Amino Acids 2024; 56:20. [PMID: 38460024 PMCID: PMC11470854 DOI: 10.1007/s00726-023-03368-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 12/05/2023] [Indexed: 03/11/2024]
Abstract
The mutant matrilineal (mtl) gene encoding patatin-like phospholipase activity is involved in in-vivo maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6-7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for in-vivo haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of Zea mays were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained > 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in in-vivo haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.
Collapse
Affiliation(s)
- Suman Dutta
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | | | - Anirban Sil
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | | | - Alka Arora
- ICAR-Indian Agricultural Statistical Research Institute, New Delhi, India
| | - Nisrita Gain
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Gulab Chand
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Rashmi Chhabra
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | | | - Firoz Hossain
- ICAR-Indian Agricultural Research Institute, New Delhi, India.
| |
Collapse
|
5
|
Kumari A, Singh M, Sharma R, Kumar T, Jindal N, Maan S, Joshi VG. Apoptin NLS2 homodimerization strategy for improved antibacterial activity and bio-stability. Amino Acids 2023; 55:1405-1416. [PMID: 37725185 DOI: 10.1007/s00726-023-03321-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/21/2023] [Indexed: 09/21/2023]
Abstract
The emergence of antibiotic resistance prompts exploration of viable antimicrobial peptides (AMPs) designs. The present study explores the antimicrobial prospects of Apoptin nuclear localization sequence (NLS2)-derived peptide ANLP (PRPRTAKRRIRL). Further, we examined the utility of the NLS dimerization strategy for improvement in antimicrobial activity and sustained bio-stability of AMPs. Initially, the antimicrobial potential of ANLP using antimicrobial peptide databases was analyzed. Then, ANLP along with its two homodimer variants namely ANLP-K1 and ANLP-K2 were synthesized and evaluated for antimicrobial activity against Escherichia coli and Salmonella. Among three AMPs, ANLP-K2 showed efficient antibacterial activity with 12 µM minimum inhibitory concentration (MIC). Slow degradation of ANLP-K1 (26.48%) and ANLP-K2 (13.21%) compared with linear ANLP (52.33%) at 480 min in serum stability assay indicates improved bio-stability of dimeric peptides. The AMPs presented no cytotoxicity in Vero cells. Dye penetration assays confirmed the membrane interacting nature of AMPs. The zeta potential analysis reveals effective charge neutralization of both lipopolysaccharide (LPS) and bacterial cells by dimeric AMPs. The dimeric AMPs on scanning electron microscopy studies showed multiple pore formations on the bacterial surface. Collectively, proposed Lysine scaffold dimerization of Apoptin NLS2 strategy resulted in enhancing antibacterial activity, bio-stability, and could be effective in neutralizing the off-target effect of LPS. In conclusion, these results suggest that nuclear localization sequence with a modified dimeric approach could represent a rich source of template for designing future antimicrobial peptides.
Collapse
Affiliation(s)
- Anu Kumari
- Department of Animal Biotechnology, College of Veterinary Sciences, Lala Lajpat Rai University of Veterinary and Animal Sciences (LUVAS), Hisar, Haryana, 125004, India
| | - Mahavir Singh
- College Central Laboratory, College of Veterinary Sciences, LUVAS, Hisar, Haryana, 125004, India
| | - Ruchi Sharma
- Department of Animal Biotechnology, College of Veterinary Sciences, Lala Lajpat Rai University of Veterinary and Animal Sciences (LUVAS), Hisar, Haryana, 125004, India
| | - Tarun Kumar
- Veterinary Clinical Complex, College of Veterinary Sciences, LUVAS, Hisar, Haryana, 125004, India
| | - Naresh Jindal
- Department of Veterinary Public Health and Epidemiology, College of Veterinary Sciences, LUVAS, Hisar, Haryana, 125004, India
| | - Sushila Maan
- Department of Animal Biotechnology, College of Veterinary Sciences, Lala Lajpat Rai University of Veterinary and Animal Sciences (LUVAS), Hisar, Haryana, 125004, India
| | - Vinay G Joshi
- Department of Animal Biotechnology, College of Veterinary Sciences, Lala Lajpat Rai University of Veterinary and Animal Sciences (LUVAS), Hisar, Haryana, 125004, India.
| |
Collapse
|
6
|
Zabardast A, Tamer EG, Son YA, Yılmaz A. An automated framework for evaluation of deep learning models for splice site predictions. Sci Rep 2023; 13:10221. [PMID: 37353532 PMCID: PMC10290104 DOI: 10.1038/s41598-023-34795-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 05/08/2023] [Indexed: 06/25/2023] Open
Abstract
A novel framework for the automated evaluation of various deep learning-based splice site detectors is presented. The framework eliminates time-consuming development and experimenting activities for different codebases, architectures, and configurations to obtain the best models for a given RNA splice site dataset. RNA splicing is a cellular process in which pre-mRNAs are processed into mature mRNAs and used to produce multiple mRNA transcripts from a single gene sequence. Since the advancement of sequencing technologies, many splice site variants have been identified and associated with the diseases. So, RNA splice site prediction is essential for gene finding, genome annotation, disease-causing variants, and identification of potential biomarkers. Recently, deep learning models performed highly accurately for classifying genomic signals. Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) and its bidirectional version (BLSTM), Gated Recurrent Unit (GRU), and its bidirectional version (BGRU) are promising models. During genomic data analysis, CNN's locality feature helps where each nucleotide correlates with other bases in its vicinity. In contrast, BLSTM can be trained bidirectionally, allowing sequential data to be processed from forward and reverse directions. Therefore, it can process 1-D encoded genomic data effectively. Even though both methods have been used in the literature, a performance comparison was missing. To compare selected models under similar conditions, we have created a blueprint for a series of networks with five different levels. As a case study, we compared CNN and BLSTM models' learning capabilities as building blocks for RNA splice site prediction in two different datasets. Overall, CNN performed better with [Formula: see text] accuracy ([Formula: see text] improvement), [Formula: see text] F1 score ([Formula: see text] improvement), and [Formula: see text] AUC-PR ([Formula: see text] improvement) in human splice site prediction. Likewise, an outperforming performance with [Formula: see text] accuracy ([Formula: see text] improvement), [Formula: see text] F1 score ([Formula: see text] improvement), and [Formula: see text] AUC-PR ([Formula: see text] improvement) is achieved in C. elegans splice site prediction. Overall, our results showed that CNN learns faster than BLSTM and BGRU. Moreover, CNN performs better at extracting sequence patterns than BLSTM and BGRU. To our knowledge, no other framework is developed explicitly for evaluating splice detection models to decide the best possible model in an automated manner. So, the proposed framework and the blueprint would help selecting different deep learning models, such as CNN vs. BLSTM and BGRU, for splice site analysis or similar classification tasks and in different problems.
Collapse
Affiliation(s)
- Amin Zabardast
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Elif Güney Tamer
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Yeşim Aydın Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Arif Yılmaz
- Institute of Data Science, Maastricht University, Maastricht, The Netherlands.
| |
Collapse
|
7
|
Iram D, Kindarle UA, Sansi MS, Meena S, Puniya AK, Vij S. Peptidomics-based identification of an antimicrobial peptide derived from goat milk fermented by Lactobacillus rhamnosus (C25). J Food Biochem 2022; 46:e14450. [PMID: 36226982 DOI: 10.1111/jfbc.14450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 09/03/2022] [Accepted: 09/20/2022] [Indexed: 01/14/2023]
Abstract
Antimicrobial peptides (AMPs) are emerging as promising novel drug applicants. In the present study, goat milk was fermented using Lactobacillus rhamnosus C25 to generate bioactive peptides (BAPs). The peptide fractions generated were separated using ultrafiltration membranes with molecular weight cut-offs of 3, 5, and 10 kDa, and their antimicrobial activity toward Gram-positive and Gram-negative bacteria was investigated. Isolated AMPs were characterized using RP-HPLC and identified by LC-MS/MS. A total of 569 sequences of peptides were identified by mass spectrometry. Out of the 569, 36 were predicted as AMPs, 21 were predicted as cationic, and out of 21, 6 AMPs were helical peptides. In silico analysis indicated that the majority of peptides were antimicrobial and cationic in nature, an important factor for peptide interaction with the negative charge membrane of bacteria. The results showed that the peptides of <5 kDa exhibited maximum antibacterial activity against E. faecalis, E. coli, and S. typhi. Further, molecular docking was used to evaluate the potent MurD ligase inhibitors. On the basis of ligand binding energy, six predicted AMPs were selected and then analyzed by AutoDock tools. Among the six AMPs, peptides IGHFKLIFSLLRV (-7.5 kcal/mol) and KSFCPAPVAPPPPT (-7.6 kcal/mol), were predicted as a high-potent antimicrobial. Based on these findings, in silico investigations reveal that proteins of goat milk are a potential source of AMPs. This is for the first time that the antimicrobial peptides produced by Lactobacillus rhamnosus (C25) fermentation of goat milk have been identified via LC-MS/MS and predicted as AMPs, cationic charges, helical structure in nature, and potent MurD ligase inhibitors. These peptides can be synthesized and improved for use as antimicrobial agents. PRACTICAL APPLICATIONS: Goat milk is considered a high-quality source of milk protein. According to this study, goat milk protein is a potential source of AMPs, Fermentation can yield goat milk-derived peptides with a broad antibacterial activity spectrum at a low cost. The approach described here could be beneficial in that the significant AMPs can be synthesized and used in the pharmaceutical and food industries.
Collapse
Affiliation(s)
- Daraksha Iram
- Antimicrobial Peptides, Biofunctional Probiotics & Peptidomics Laboratory, Dairy Microbiology Division, ICAR-National Dairy Research Institute, Karnal, India
| | - Uday Arun Kindarle
- Antimicrobial Peptides, Biofunctional Probiotics & Peptidomics Laboratory, Dairy Microbiology Division, ICAR-National Dairy Research Institute, Karnal, India
| | - Manish Singh Sansi
- Biofunctional Peptidomics & Metabolic Syndrome Laboratory, Animal Biochemistry Division, ICAR-National Dairy Research Institute, Karnal, India
| | - Sunita Meena
- Biofunctional Peptidomics & Metabolic Syndrome Laboratory, Animal Biochemistry Division, ICAR-National Dairy Research Institute, Karnal, India
| | - Anil Kumar Puniya
- Anaerobic Microbial Fermentation Laboratory, Dairy Microbiology Division, ICAR-National Dairy Research Institute, Karnal, India
| | - Shilpa Vij
- Antimicrobial Peptides, Biofunctional Probiotics & Peptidomics Laboratory, Dairy Microbiology Division, ICAR-National Dairy Research Institute, Karnal, India
| |
Collapse
|
8
|
Parra ALC, Bezerra LP, Shawar DE, Neto NAS, Mesquita FP, da Silva GO, Souza PFN. Synthetic antiviral peptides: a new way to develop targeted antiviral drugs. Future Virol 2022. [DOI: 10.2217/fvl-2021-0308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The global concern over emerging and re-emerging viral infections has spurred the search for novel antiviral agents. Peptides with antiviral activity stand out, by overcoming limitations of the current drugs utilized, due to their biocompatibility, specificity and effectiveness. Synthetic peptides have been shown to be viable alternatives to natural peptides due to several difficulties of using of the latter in clinical trials. Various platforms have been utilized by researchers to predict the most effective peptide sequences against HIV, influenza, dengue, MERS and SARS. Synthetic peptides are already employed in the treatment of HIV infection. The novelty of this study is to discuss, for the first time, the potential of synthetic peptides as antiviral molecules. We conclude that synthetic peptides can act as new weapons against viral threats to humans.
Collapse
Affiliation(s)
- Aura LC Parra
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Leandro P Bezerra
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Dur E Shawar
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Nilton AS Neto
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Felipe P Mesquita
- Drug Research & Development Center (NPDM), Federal University of Ceará, Cel. Nunes de Melo, Rodolfo Teófilo, 1000, Fortaleza, Brazil
| | - Gabrielly O da Silva
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Pedro FN Souza
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
- Drug Research & Development Center (NPDM), Federal University of Ceará, Cel. Nunes de Melo, Rodolfo Teófilo, 1000, Fortaleza, Brazil
| |
Collapse
|
9
|
Scalzitti N, Kress A, Orhand R, Weber T, Moulinier L, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. Spliceator: multi-species splice site prediction using convolutional neural networks. BMC Bioinformatics 2021; 22:561. [PMID: 34814826 PMCID: PMC8609763 DOI: 10.1186/s12859-021-04471-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 11/09/2021] [Indexed: 12/14/2022] Open
Abstract
Background Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. Results We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89–92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. Conclusions Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04471-3.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.,BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Romain Orhand
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Thomas Weber
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Luc Moulinier
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.,BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Anne Jeannin-Girardon
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Pierre Collet
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Julie D Thompson
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.
| |
Collapse
|
10
|
Meher PK, Satpathy S. Improved recognition of splice sites in A. thaliana by incorporating secondary structure information into sequence-derived features: a computational study. 3 Biotech 2021; 11:484. [PMID: 34790508 DOI: 10.1007/s13205-021-03036-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 10/18/2021] [Indexed: 10/19/2022] Open
Abstract
Identification of splice sites is an important aspect with regard to the prediction of gene structure. In most of the existing splice site prediction studies, machine learning algorithms coupled with sequence-derived features have been successfully employed for splice site recognition. However, the splice site identification by incorporating the secondary structure information is lacking, particularly in plant species. Thus, we made an attempt in this study to evaluate the performance of structural features on the splice site prediction accuracy in Arabidopsis thaliana. Prediction accuracies were evaluated with the sequence-derived features alone as well as by incorporating the structural features into the sequence-derived features, where support vector machine (SVM) was employed as prediction algorithm. Both short (40 base pairs) and long (105 base pairs) sequence datasets were considered for evaluation. After incorporating the secondary structure features, improvements in accuracies were observed only for the longer sequence dataset and the improvement was found to be higher with the sequence-derived features that accounted nucleotide dependencies. On the other hand, either a little or no improvement in accuracies was found for the short sequence dataset. The performance of SVM was further compared with that of LogitBoost, Random Forest (RF), AdaBoost and XGBoost machine learning methods. The prediction accuracies of SVM, AdaBoost and XGBoost were observed to be at par and higher than that of RF and LogitBoost algorithms. While prediction was performed by taking all the sequence-derived features along with the structural features, a little improvement in accuracies was found as compared to the combination of individual sequence-based features and structural features. To the best of our knowledge, this is the first attempt concerning the computational prediction of splice sites using machine learning methods by incorporating the secondary structure information into the sequence-derived features. All the source codes are available at https://github.com/meher861982/SSFeature. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s13205-021-03036-8.
Collapse
|
11
|
Das L, Das JK, Mohapatra S, Nanda S. DNA numerical encoding schemes for exon prediction: a recent history. NUCLEOSIDES NUCLEOTIDES & NUCLEIC ACIDS 2021; 40:985-1017. [PMID: 34455915 DOI: 10.1080/15257770.2021.1966797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Bioinformatics in the present day has been firmly established as a regulator in genomics. In recent times, applications of Signal processing in exon prediction have gained a lot of attention. The exons carry protein information. Proteins are composed of connected constituents known as amino acids that characterize the specific function. Conversion of the nucleotide character string into a numerical sequence is the gateway before analyzing it through signal processing methods. This numeric encoding is the mathematical descriptor of nucleotides and is based on some statistical properties of the structure of nucleic acids. Since the type of encoding extremely affects the exon detection accuracy, this paper is devised for the review of existing encoding (mapping) schemes. The comparative analysis is formulated to emphasize the importance of the genetic code setting of amino acids considered for application related to computational elucidation for exon detection. This work covers much helpful information for future applications.
Collapse
Affiliation(s)
- Lopamudra Das
- School of Electronics Engineering, KIIT, Bhubaneswar, India
| | - J K Das
- School of Electronics Engineering, KIIT, Bhubaneswar, India
| | - S Mohapatra
- School of Electronics Engineering, KIIT, Bhubaneswar, India
| | - Sarita Nanda
- School of Electronics Engineering, KIIT, Bhubaneswar, India
| |
Collapse
|
12
|
Meher PK, Rai A, Rao AR. mLoc-mRNA: predicting multiple sub-cellular localization of mRNAs using random forest algorithm coupled with feature selection via elastic net. BMC Bioinformatics 2021; 22:342. [PMID: 34167457 PMCID: PMC8223360 DOI: 10.1186/s12859-021-04264-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 06/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Localization of messenger RNAs (mRNAs) plays a crucial role in the growth and development of cells. Particularly, it plays a major role in regulating spatio-temporal gene expression. The in situ hybridization is a promising experimental technique used to determine the localization of mRNAs but it is costly and laborious. It is also a known fact that a single mRNA can be present in more than one location, whereas the existing computational tools are capable of predicting only a single location for such mRNAs. Thus, the development of high-end computational tool is required for reliable and timely prediction of multiple subcellular locations of mRNAs. Hence, we develop the present computational model to predict the multiple localizations of mRNAs. RESULTS The mRNA sequences from 9 different localizations were considered. Each sequence was first transformed to a numeric feature vector of size 5460, based on the k-mer features of sizes 1-6. Out of 5460 k-mer features, 1812 important features were selected by the Elastic Net statistical model. The Random Forest supervised learning algorithm was then employed for predicting the localizations with the selected features. Five-fold cross-validation accuracies of 70.87, 68.32, 68.36, 68.79, 96.46, 73.44, 70.94, 97.42 and 71.77% were obtained for the cytoplasm, cytosol, endoplasmic reticulum, exosome, mitochondrion, nucleus, pseudopodium, posterior and ribosome respectively. With an independent test set, accuracies of 65.33, 73.37, 75.86, 72.99, 94.26, 70.91, 65.53, 93.60 and 73.45% were obtained for the respective localizations. The developed approach also achieved higher accuracies than the existing localization prediction tools. CONCLUSIONS This study presents a novel computational tool for predicting the multiple localization of mRNAs. Based on the proposed approach, an online prediction server "mLoc-mRNA" is accessible at http://cabgrid.res.in:8080/mlocmrna/ . The developed approach is believed to supplement the existing tools and techniques for the localization prediction of mRNAs.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
| | | |
Collapse
|
13
|
Pixel- vs. Object-Based Landsat 8 Data Classification in Google Earth Engine Using Random Forest: The Case Study of Maiella National Park. REMOTE SENSING 2021. [DOI: 10.3390/rs13122299] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the general objective of producing a 2018–2020 Land Use/Land Cover (LULC) map of the Maiella National Park (central Italy), useful for a future long-term LULC change analysis, this research aimed to develop a Landsat 8 (L8) data composition and classification process using Google Earth Engine (GEE). In this process, we compared two pixel-based (PB) and two object-based (OB) approaches, assessing the advantages of integrating the textural information in the PB approach. Moreover, we tested the possibility of using the L8 panchromatic band to improve the segmentation step and the object’s textural analysis of the OB approach and produce a 15-m resolution LULC map. After selecting the best time window of the year to compose the base data cube, we applied a cloud-filtering and a topography-correction process on the 32 available L8 surface reflectance images. On this basis, we calculated five spectral indices, some of them on an interannual basis, to account for vegetation seasonality. We added an elevation, an aspect, a slope layer, and the 2018 CORINE Land Cover classification layer to improve the available information. We applied the Gray-Level Co-Occurrence Matrix (GLCM) algorithm to calculate the image’s textural information and, in the OB approaches, the Simple Non-Iterative Clustering (SNIC) algorithm for the image segmentation step. We performed an initial RF optimization process finding the optimal number of decision trees through out-of-bag error analysis. We randomly distributed 1200 ground truth points and used 70% to train the RF classifier and 30% for the validation phase. This subdivision was randomly and recursively redefined to evaluate the performance of the tested approaches more robustly. The OB approaches performed better than the PB ones when using the 15 m L8 panchromatic band, while the addition of textural information did not improve the PB approach. Using the panchromatic band within an OB approach, we produced a detailed, 15-m resolution LULC map of the study area.
Collapse
|
14
|
Moosa S, Amira PA, Boughorbel DS. DASSI: differential architecture search for splice identification from DNA sequences. BioData Min 2021; 14:15. [PMID: 33588916 PMCID: PMC7885202 DOI: 10.1186/s13040-021-00237-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 01/05/2021] [Indexed: 11/28/2022] Open
Abstract
Background The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learning (DL) in solving many difficult tasks in image, speech and natural language processing by automating the manual process of architecture design. This has been fueled through the development of new DL architectures. Yet genomics possesses unique challenges that requires customization and development of new DL models. Methods We proposed a new model, DASSI, by adapting a differential architecture search method and applying it to the Splice Site (SS) recognition task on DNA sequences to discover new high-performance convolutional architectures in an automated manner. We evaluated the discovered model against state-of-the-art tools to classify true and false SS in Homo sapiens (Human), Arabidopsis thaliana (Plant), Caenorhabditis elegans (Worm) and Drosophila melanogaster (Fly). Results Our experimental evaluation demonstrated that the discovered architecture outperformed baseline models and fixed architectures and showed competitive results against state-of-the-art models used in classification of splice sites. The proposed model - DASSI has a compact architecture and showed very good results on a transfer learning task. The benchmarking experiments of execution time and precision on architecture search and evaluation process showed better performance on recently available GPUs making it feasible to adopt architecture search based methods on large datasets. Conclusions We proposed the use of differential architecture search method (DASSI) to perform SS classification on raw DNA sequences, and discovered new neural network models with low number of tunable parameters and competitive performance compared with manually engineered architectures. We have extensively benchmarked DASSI model with other state-of-the-art models and assessed its computational efficiency. The results have shown a high potential of using automated architecture search mechanism for solving various problems in the field of genomics.
Collapse
Affiliation(s)
- Shabir Moosa
- Department of Systems Biology, SIDRA Medicine, Doha, 26999, Qatar. .,Dept. of Computer Science and Engineering, Qatar University, Doha, 2713, Qatar.
| | - Prof Abbes Amira
- Dept. of Computer Science and Engineering, Qatar University, Doha, 2713, Qatar
| | | |
Collapse
|
15
|
Souza PFN, Amaral JL, Bezerra LP, Lopes FES, Freire VN, Oliveira JTA, Freitas CDT. ACE2-derived peptides interact with the RBD domain of SARS-CoV-2 spike glycoprotein, disrupting the interaction with the human ACE2 receptor. J Biomol Struct Dyn 2021; 40:5493-5506. [PMID: 33427102 PMCID: PMC7876913 DOI: 10.1080/07391102.2020.1871415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Vaccines could be the solution to the current SARS-CoV-2 outbreak. However, some studies have shown that the immunological memory only lasts three months. Thus, it is imperative to develop pharmacological treatments to cope with COVID-19. Here, the in silico approach by molecular docking, dynamic simulations and quantum biochemistry revealed that ACE2-derived peptides strongly interact with the SARS-CoV-2 RBD domain of spike glycoprotein (S-RBD). ACE2-Dev-PepI, ACE2-Dev-PepII, ACE2-Dev-PepIII and ACE2-Dev-PepIV complexed with S-RBD provoked alterations in the 3D structure of S-RBD, leading to disruption of the correct interaction with the ACE2 receptor, a pivotal step for SARS-CoV-2 infection. This wrong interaction between S-RBD and ACE2 could inhibit the entry of SARS-CoV-2 in cells, and thus virus replication and the establishment of COVID-19 disease. Therefore, we suggest that ACE2-derived peptides can interfere with recognition of ACE2 in human cells by SARS-CoV-2 in vivo. Bioinformatic prediction showed that these peptides have no toxicity or allergenic potential. By using ACE2-derived peptides against SARS-CoV-2, this study points to opportunities for further in vivo research on these peptides, seeking to discover new drugs and entirely new perspectives to treat COVID-19.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Pedro F N Souza
- Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil
| | - Jackson L Amaral
- Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil.,Department of Physics, Federal University of Ceará, Fortaleza, Brazil
| | - Leandro P Bezerra
- Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil
| | - Francisco E S Lopes
- Center for Permanent Education in Health Care, CEATS/School of Public Health of Ceará-ESP-CE, Fortaleza, Brazil
| | - Valder N Freire
- Department of Physics, Federal University of Ceará, Fortaleza, Brazil
| | - Jose T A Oliveira
- Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil
| | - Cleverson D T Freitas
- Department of Biochemistry and Molecular Biology, Federal University of Ceará, Fortaleza, Brazil
| |
Collapse
|
16
|
Souza PF, Marques LS, Oliveira JT, Lima PG, Dias LP, Neto NA, Lopes FE, Sousa JS, Silva AF, Caneiro RF, Lopes JL, Ramos MV, Freitas CD. Synthetic antimicrobial peptides: From choice of the best sequences to action mechanisms. Biochimie 2020; 175:132-145. [DOI: 10.1016/j.biochi.2020.05.016] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Revised: 05/16/2020] [Accepted: 05/30/2020] [Indexed: 12/28/2022]
|
17
|
Amilpur S, Bhukya R. EDeepSSP: Explainable deep neural networks for exact splice sites prediction. J Bioinform Comput Biol 2020; 18:2050024. [PMID: 32696716 DOI: 10.1142/s0219720020500249] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Splice site prediction is crucial for understanding underlying gene regulation, gene function for better genome annotation. Many computational methods exist for recognizing the splice sites. Although most of the methods achieve a competent performance, their interpretability remains challenging. Moreover, all traditional machine learning methods manually extract features, which is tedious job. To address these challenges, we propose a deep learning-based approach (EDeepSSP) that employs convolutional neural networks (CNNs) architecture for automatic feature extraction and effectively predicts splice sites. Our model, EDeepSSP, divulges the opaque nature of CNN by extracting significant motifs and explains why these motifs are vital for predicting splice sites. In this study, experiments have been conducted on six benchmark acceptors and donor datasets of humans, cress, and fly. The results show that EDeepSSP has outperformed many state-of-the-art approaches. EDeepSSP achieves the highest area under the receiver operating characteristic curve (AUC_ROC) and area under the precision-recall curve (AUC_PR) of 99.32% and 99.26% on human donor datasets, respectively. We also analyze various filter activities, feature activations, and extracted significant motifs responsible for the splice site prediction. Further, we validate the learned motifs of our model against known motifs of JASPAR splice site database.
Collapse
Affiliation(s)
- Santhosh Amilpur
- Computer Science and Engineering, National Institute of Technology Warangal, Warangal, Telangana 506004, India
| | - Raju Bhukya
- Computer Science and Engineering, National Institute of Technology Warangal, Warangal, Telangana 506004, India
| |
Collapse
|
18
|
Thanapattheerakul T, Engchuan W, Chan JH. Predicting the effect of variants on splicing using Convolutional Neural Networks. PeerJ 2020; 8:e9470. [PMID: 32704450 PMCID: PMC7346860 DOI: 10.7717/peerj.9470] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 06/11/2020] [Indexed: 11/23/2022] Open
Abstract
Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10−7). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition.
Collapse
Affiliation(s)
| | - Worrawat Engchuan
- Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada.,The Centre for Applied Genomics, The Hospital of Sick Children, Toronto, Ontario, Canada
| | - Jonathan H Chan
- School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand.,IC2-DLab, School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
| |
Collapse
|
19
|
Albaradei S, Magana-Mora A, Thafar M, Uludag M, Bajic VB, Gojobori T, Essack M, Jankovic BR. Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA. Gene 2020; 763S:100035. [PMID: 32550561 PMCID: PMC7285987 DOI: 10.1016/j.gene.2020.100035] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 05/06/2020] [Indexed: 12/21/2022]
Abstract
Background The accurate identification of the exon/intron boundaries is critical for the correct annotation of genes with multiple exons. Donor and acceptor splice sites (SS) demarcate these boundaries. Therefore, deriving accurate computational models to predict the SS are useful for functional annotation of genes and genomes, and for finding alternative SS associated with different diseases. Although various models have been proposed for the in silico prediction of SS, improving their accuracy is required for reliable annotation. Moreover, models are often derived and tested using the same genome, providing no evidence of broad application, i.e. to other poorly studied genomes. Results With this in mind, we developed the Splice2Deep models for SS detection. Each model is an ensemble of deep convolutional neural networks. We evaluated the performance of the models based on the ability to detect SS in Homo sapiens, Oryza sativa japonica, Arabidopsis thaliana, Drosophila melanogaster, and Caenorhabditis elegans. Results demonstrate that the models efficiently detect SS in other organisms not considered during the training of the models. Compared to the state-of-the-art tools, Splice2Deep models achieved significantly reduced average error rates of 41.97% and 28.51% for acceptor and donor SS, respectively. Moreover, the Splice2Deep cross-organism validation demonstrates that models correctly identify conserved genomic elements enabling annotation of SS in new genomes by choosing the taxonomically closest model. Conclusions The results of our study demonstrated that Splice2Deep both achieved a considerably reduced error rate compared to other state-of-the-art models and the ability to accurately recognize SS in other organisms for which the model was not trained, enabling annotation of poorly studied or newly sequenced genomes. Splice2Deep models are implemented in Python using Keras API; the models and the data are available at https://github.com/SomayahAlbaradei/Splice_Deep.git.
Collapse
Key Words
- AUC, area under curve
- AcSS, acceptor splice site
- Acc, accuracy
- Bioinformatics
- CNN, convolutional neural network
- CONV, convolutional layers
- DL, deep learning
- DNA, deoxyribonucleic acid
- DT, decision trees
- Deep-learning
- DoSS, donor splice site
- FC, fully connected layer
- ML, machine learning
- NB, naive Bayes
- NN, neural network
- POOL, pooling layer
- Prediction
- RF, random forest
- RNA, ribonucleic acid
- ReLU, rectified linear unit layer
- SS, splice site
- SVM, support vector machine
- Sn, sensitivity
- Sp, specificity
- Splice sites
- Splicing
Collapse
Affiliation(s)
- Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia
| | - Arturo Magana-Mora
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Saudi Aramco, EXPEC-ARC, Drilling Technology Team, Dhahran 31311, Saudi Arabia
| | - Maha Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Faculty of Computers and Information Systems, Taif University, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Boris R Jankovic
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
20
|
Meher PK, Sahu TK, Gahoi S, Satpathy S, Rao AR. Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition. Gene 2019; 705:113-126. [PMID: 31009682 DOI: 10.1016/j.gene.2019.04.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 03/27/2019] [Accepted: 04/17/2019] [Indexed: 02/02/2023]
Abstract
Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - Tanmaya Kumar Sahu
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Shachi Gahoi
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Subhrajit Satpathy
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | | |
Collapse
|
21
|
Zeng Y, Yuan H, Yuan Z, Chen Y. A high-performance approach for predicting donor splice sites based on short window size and imbalanced large samples. Biol Direct 2019; 14:6. [PMID: 30975175 PMCID: PMC6460831 DOI: 10.1186/s13062-019-0236-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Accepted: 03/18/2019] [Indexed: 11/10/2022] Open
Abstract
Background Splice sites prediction has been a long-standing problem in bioinformatics. Although many computational approaches developed for splice site prediction have achieved satisfactory accuracy, further improvement in predictive accuracy is significant, for it is contributing to predict gene structure more accurately. Determining a proper window size before prediction is necessary. Overly long window size may introduce some irrelevant features, which would reduce predictive accuracy, while the use of short window size with maximum information may performs better in terms of predictive accuracy and time cost. Furthermore, the number of false splice sites following the GT–AG rule far exceeds that of true splice sites, accurate and rapid prediction of splice sites using imbalanced large samples has always been a challenge. Therefore, based on the short window size and imbalanced large samples, we developed a new computational method named chi-square decision table (χ2-DT) for donor splice site prediction. Results Using a short window size of 11 bp, χ2-DT extracts the improved positional features and compositional features based on chi-square test, then introduces features one by one based on information gain, and constructs a balanced decision table aimed at implementing imbalanced pattern classification. With a 2000:271,132 (true sites:false sites) training set, χ2-DT achieves the highest independent test accuracy (93.34%) when compared with three classifiers (random forest, artificial neural network, and relaxed variable kernel density estimator) and takes a short computation time (89 s). χ2-DT also exhibits good independent test accuracy (92.40%), when validated with BG-570 mutated sequences with frameshift errors (nucleotide insertions and deletions). Moreover, χ2-DT is compared with the long-window size-based methods and the short-window size-based methods, and is found to perform better than all of them in terms of predictive accuracy. Conclusions Based on short window size and imbalanced large samples, the proposed method not only achieves higher predictive accuracy than some existing methods, but also has high computational speed and good robustness against nucleotide insertions and deletions. Reviewers This article was reviewed by Ryan McGinty, Ph.D. and Dirk Walther. Electronic supplementary material The online version of this article (10.1186/s13062-019-0236-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ying Zeng
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University, Changsha, 410128, Hunan, China.,Orient Science & Technology College, Hunan Agricultural University, Changsha, 410128, Hunan, China
| | - Hongjie Yuan
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University, Changsha, 410128, Hunan, China
| | - Zheming Yuan
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University, Changsha, 410128, Hunan, China. .,Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, Hunan, China.
| | - Yuan Chen
- Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Hunan Agricultural University, Changsha, 410128, Hunan, China.
| |
Collapse
|
22
|
Meher PK, Sahu TK, Gahoi S, Tomar R, Rao AR. funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model. BMC Genet 2019; 20:2. [PMID: 30616524 PMCID: PMC6323839 DOI: 10.1186/s12863-018-0710-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 12/26/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Shachi Gahoi
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Ruchi Tomar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
- Department of Bioinformatics, Janta Vedic College, Baraut, Baghpat, Uttar Pradesh 250611 India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| |
Collapse
|
23
|
Baert A, Machackova E, Coene I, Cremin C, Turner K, Portigal-Todd C, Asrat MJ, Nuk J, Mindlin A, Young S, MacMillan A, Van Maerken T, Trbusek M, McKinnon W, Wood ME, Foulkes WD, Santamariña M, de la Hoya M, Foretova L, Poppe B, Vral A, Rosseel T, De Leeneer K, Vega A, Claes KBM. Thorough in silico and in vitro cDNA analysis of 21 putative BRCA1 and BRCA2 splice variants and a complex tandem duplication in BRCA2 allowing the identification of activated cryptic splice donor sites in BRCA2 exon 11. Hum Mutat 2018; 39:515-526. [PMID: 29280214 DOI: 10.1002/humu.23390] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Revised: 11/03/2017] [Accepted: 12/17/2017] [Indexed: 12/31/2022]
Abstract
For 21 putative BRCA1 and BRCA2 splice site variants, the concordance between mRNA analysis and predictions by in silico programs was evaluated. Aberrant splicing was confirmed for 12 alterations. In silico prediction tools were helpful to determine for which variants cDNA analysis is warranted, however, predictions for variants in the Cartegni consensus region but outside the canonical sites, were less reliable. Learning algorithms like Adaboost and Random Forest outperformed the classical tools. Further validations are warranted prior to implementation of these novel tools in clinical settings. Additionally, we report here for the first time activated cryptic donor sites in the large exon 11 of BRCA2 by evaluating the effect at the cDNA level of a novel tandem duplication (5' breakpoint in intron 4; 3' breakpoint in exon 11) and of a variant disrupting the splice donor site of exon 11 (c.6841+1G > C). Additional sites were predicted, but not activated. These sites warrant further research to increase our knowledge on cis and trans acting factors involved in the conservation of correct transcription of this large exon. This may contribute to adequate design of ASOs (antisense oligonucleotides), an emerging therapy to render cancer cells sensitive to PARP inhibitor and platinum therapies.
Collapse
Affiliation(s)
- Annelot Baert
- Department of Basic Medical Sciences, Ghent University, Ghent, Belgium.,Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Eva Machackova
- Department of Cancer Epidemiology and Genetics, Masaryk Memorial Cancer Institute, Brno, Czech Republic
| | - Ilse Coene
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Carol Cremin
- BC Cancer Agency, Vancouver, British Columbia, Canada
| | | | | | | | - Jennifer Nuk
- BC Cancer Agency, Vancouver, British Columbia, Canada
| | | | - Sean Young
- BC Cancer Agency, Vancouver, British Columbia, Canada.,Cancer Genetics and Genomics Laboratory, Department of Pathology and Laboratory Medicine, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Andree MacMillan
- Provincial Medical Genetics Program, Eastern Health, St. John's, Newfoundland and Labrador, Canada
| | - Tom Van Maerken
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Martin Trbusek
- Department of Internal Medicine - Hematology and Oncology, University Hospital Brno, Brno, Czech Republic
| | - Wendy McKinnon
- Familial Cancer Program, University of Vermont Medical Center, Burlington, Vermont, United States
| | - Marie E Wood
- Familial Cancer Program, University of Vermont Medical Center, Burlington, Vermont, United States
| | - William D Foulkes
- Cancer Research Program, Research Institute of the McGill University Health Centre, McGill University, Montreal, Quebec, Canada
| | - Marta Santamariña
- Fundación Pública Galega de Medicina Xenómica-SERGAS, Grupo de Medicina Xenómica, CIBERER, IDIS, Santiago de Compostela, Spain
| | - Miguel de la Hoya
- Molecular Oncology Laboratory CIBERONC, Hospital Clinico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Lenka Foretova
- Department of Cancer Epidemiology and Genetics, Masaryk Memorial Cancer Institute, Brno, Czech Republic
| | - Bruce Poppe
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Anne Vral
- Department of Basic Medical Sciences, Ghent University, Ghent, Belgium
| | - Toon Rosseel
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Kim De Leeneer
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Ana Vega
- Fundación Pública Galega de Medicina Xenómica-SERGAS, Grupo de Medicina Xenómica, CIBERER, IDIS, Santiago de Compostela, Spain
| | | |
Collapse
|
24
|
Yu CY, Li XX, Yang H, Li YH, Xue WW, Chen YZ, Tao L, Zhu F. Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate. Int J Mol Sci 2018; 19:E183. [PMID: 29316706 PMCID: PMC5796132 DOI: 10.3390/ijms19010183] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 12/09/2017] [Accepted: 01/04/2018] [Indexed: 12/27/2022] Open
Abstract
The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Mycobacterium tuberculosis). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.
Collapse
Affiliation(s)
- Chun Yan Yu
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Xiao Xu Li
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Hong Yang
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Ying Hong Li
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Wei Wei Xue
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
| | - Yu Zong Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543, Singapore.
| | - Lin Tao
- School of Medicine, Hangzhou Normal University, Hangzhou 310012, China.
| | - Feng Zhu
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|