Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Meher PK, Sahu TK, Rao AR, Wahi SD. A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data. BMC Bioinformatics 2014;15:362. [PMID: 25420551 PMCID: PMC4702320 DOI: 10.1186/s12859-014-0362-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Accepted: 10/24/2014] [Indexed: 11/17/2022] Open

For:	Meher PK, Sahu TK, Rao AR, Wahi SD. A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data. BMC Bioinformatics 2014;15:362. [PMID: 25420551 PMCID: PMC4702320 DOI: 10.1186/s12859-014-0362-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Accepted: 10/24/2014] [Indexed: 11/17/2022] Open

Number

Cited by Other Article(s)

Brooks-Warburton J, Modos D, Sudhakar P, Madgwick M, Thomas JP, Bohar B, Fazekas D, Zoufir A, Kapuy O, Szalay-Beko M, Verstockt B, Hall LJ, Watson A, Tremelling M, Parkes M, Vermeire S, Bender A, Carding SR, Korcsmaros T. A systems genomics approach to uncover patient-specific pathogenic pathways and proteins in ulcerative colitis. Nat Commun 2022;13:2299. [PMID: 35484353 PMCID: PMC9051123 DOI: 10.1038/s41467-022-29998-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 04/06/2022] [Indexed: 12/11/2022] Open

Affiliation(s)

Johanne Brooks-Warburton Earlham Institute, Norwich Research Park, Norwich, UK Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK Department of Clinical, Pharmaceutical and Biological Sciences, University of Hertfordshire, Hertford, UK Gastroenterology Department, Lister Hospital, Stevenage, UK
Dezso Modos Earlham Institute, Norwich Research Park, Norwich, UK Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
Padhmanand Sudhakar Earlham Institute, Norwich Research Park, Norwich, UK Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
Matthew Madgwick Earlham Institute, Norwich Research Park, Norwich, UK Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
John P Thomas Earlham Institute, Norwich Research Park, Norwich, UK Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
Balazs Bohar Earlham Institute, Norwich Research Park, Norwich, UK Department of Genetics, Eötvös Loránd University, Budapest, Hungary
David Fazekas Earlham Institute, Norwich Research Park, Norwich, UK Department of Genetics, Eötvös Loránd University, Budapest, Hungary
Azedine Zoufir Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
Orsolya Kapuy Department of Molecular Biology, Semmelweis University, Budapest, Hungary
Mate Szalay-Beko Earlham Institute, Norwich Research Park, Norwich, UK
Bram Verstockt KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium University Hospitals Leuven, Department of Gastroenterology and Hepatology, KU Leuven, Leuven, Belgium
Lindsay J Hall Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK Norwich Medical School, University of East Anglia, Norwich, UK School of Life Sciences, ZIEL - Institute for Food & Health, Technical University of Munich, 80333, Freising, Germany
Alastair Watson Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK Norwich Medical School, University of East Anglia, Norwich, UK
Mark Tremelling Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
Miles Parkes Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
Severine Vermeire KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium University Hospitals Leuven, Department of Gastroenterology and Hepatology, KU Leuven, Leuven, Belgium
Andreas Bender Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
Simon R Carding Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK. Norwich Medical School, University of East Anglia, Norwich, UK.
Tamas Korcsmaros Earlham Institute, Norwich Research Park, Norwich, UK. Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.

Collapse

Amilpur S, Bhukya R. EDeepSSP: Explainable deep neural networks for exact splice sites prediction. J Bioinform Comput Biol 2020;18:2050024. [PMID: 32696716 DOI: 10.1142/s0219720020500249] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Albaradei S, Magana-Mora A, Thafar M, Uludag M, Bajic VB, Gojobori T, Essack M, Jankovic BR. Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA. Gene 2020;763S:100035. [PMID: 32550561 PMCID: PMC7285987 DOI: 10.1016/j.gene.2020.100035] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 05/06/2020] [Indexed: 12/21/2022]

Abstract

Background

The accurate identification of the exon/intron boundaries is critical for the correct annotation of genes with multiple exons. Donor and acceptor splice sites (SS) demarcate these boundaries. Therefore, deriving accurate computational models to predict the SS are useful for functional annotation of genes and genomes, and for finding alternative SS associated with different diseases. Although various models have been proposed for the in silico prediction of SS, improving their accuracy is required for reliable annotation. Moreover, models are often derived and tested using the same genome, providing no evidence of broad application, i.e. to other poorly studied genomes.

Results

With this in mind, we developed the Splice2Deep models for SS detection. Each model is an ensemble of deep convolutional neural networks. We evaluated the performance of the models based on the ability to detect SS in Homo sapiens, Oryza sativa japonica, Arabidopsis thaliana, Drosophila melanogaster, and Caenorhabditis elegans. Results demonstrate that the models efficiently detect SS in other organisms not considered during the training of the models. Compared to the state-of-the-art tools, Splice2Deep models achieved significantly reduced average error rates of 41.97% and 28.51% for acceptor and donor SS, respectively. Moreover, the Splice2Deep cross-organism validation demonstrates that models correctly identify conserved genomic elements enabling annotation of SS in new genomes by choosing the taxonomically closest model.

Conclusions

The results of our study demonstrated that Splice2Deep both achieved a considerably reduced error rate compared to other state-of-the-art models and the ability to accurately recognize SS in other organisms for which the model was not trained, enabling annotation of poorly studied or newly sequenced genomes. Splice2Deep models are implemented in Python using Keras API; the models and the data are available at https://github.com/SomayahAlbaradei/Splice_Deep.git.

Collapse

Affiliation(s)

Somayah Albaradei Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia
Arturo Magana-Mora Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Saudi Aramco, EXPEC-ARC, Drilling Technology Team, Dhahran 31311, Saudi Arabia
Maha Thafar Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Faculty of Computers and Information Systems, Taif University, Saudi Arabia
Mahmut Uludag Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
Vladimir B Bajic Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
Takashi Gojobori Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
Magbubah Essack Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
Boris R Jankovic Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia

Collapse

Splice sites detection using chaos game representation and neural network. Genomics 2019;112:1847-1852. [PMID: 31704313 DOI: 10.1016/j.ygeno.2019.10.018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 03/18/2019] [Accepted: 10/29/2019] [Indexed: 11/23/2022]

Meher PK, Sahu TK, Gahoi S, Satpathy S, Rao AR. Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition. Gene 2019;705:113-126. [PMID: 31009682 DOI: 10.1016/j.gene.2019.04.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 03/27/2019] [Accepted: 04/17/2019] [Indexed: 02/02/2023]

Abstract

Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.

Collapse

Zeng Y, Yuan H, Yuan Z, Chen Y. A high-performance approach for predicting donor splice sites based on short window size and imbalanced large samples. Biol Direct 2019;14:6. [PMID: 30975175 PMCID: PMC6460831 DOI: 10.1186/s13062-019-0236-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Accepted: 03/18/2019] [Indexed: 11/10/2022] Open

Abstract

Background

Splice sites prediction has been a long-standing problem in bioinformatics. Although many computational approaches developed for splice site prediction have achieved satisfactory accuracy, further improvement in predictive accuracy is significant, for it is contributing to predict gene structure more accurately. Determining a proper window size before prediction is necessary. Overly long window size may introduce some irrelevant features, which would reduce predictive accuracy, while the use of short window size with maximum information may performs better in terms of predictive accuracy and time cost. Furthermore, the number of false splice sites following the GT–AG rule far exceeds that of true splice sites, accurate and rapid prediction of splice sites using imbalanced large samples has always been a challenge. Therefore, based on the short window size and imbalanced large samples, we developed a new computational method named chi-square decision table (χ²-DT) for donor splice site prediction.

Results

Using a short window size of 11 bp, χ²-DT extracts the improved positional features and compositional features based on chi-square test, then introduces features one by one based on information gain, and constructs a balanced decision table aimed at implementing imbalanced pattern classification. With a 2000:271,132 (true sites:false sites) training set, χ²-DT achieves the highest independent test accuracy (93.34%) when compared with three classifiers (random forest, artificial neural network, and relaxed variable kernel density estimator) and takes a short computation time (89 s). χ²-DT also exhibits good independent test accuracy (92.40%), when validated with BG-570 mutated sequences with frameshift errors (nucleotide insertions and deletions). Moreover, χ²-DT is compared with the long-window size-based methods and the short-window size-based methods, and is found to perform better than all of them in terms of predictive accuracy.

Conclusions

Based on short window size and imbalanced large samples, the proposed method not only achieves higher predictive accuracy than some existing methods, but also has high computational speed and good robustness against nucleotide insertions and deletions.

Reviewers

This article was reviewed by Ryan McGinty, Ph.D. and Dirk Walther.

Electronic supplementary material

The online version of this article (10.1186/s13062-019-0236-y) contains supplementary material, which is available to authorized users.

Collapse

Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou's general PseAAC. Sci Rep 2017;7:42362. [PMID: 28205576 PMCID: PMC5304217 DOI: 10.1038/srep42362] [Citation(s) in RCA: 303] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/09/2017] [Indexed: 11/13/2022] Open

Meher PK, Sahu TK, Rao AR, Wahi SD. A computational approach for prediction of donor splice sites with improved accuracy. J Theor Biol 2016;404:285-294. [PMID: 27302911 DOI: 10.1016/j.jtbi.2016.06.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 04/18/2016] [Accepted: 06/09/2016] [Indexed: 11/24/2022]

Meher PK, Sahu TK, Rao AR, Wahi SD. Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features. Algorithms Mol Biol 2016;11:16. [PMID: 27252772 PMCID: PMC4888255 DOI: 10.1186/s13015-016-0078-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Accepted: 05/17/2016] [Indexed: 11/16/2022] Open