1
|
Hasan MM, Murtaz SB, Islam MU, Sadeq MJ, Uddin J. Robust and efficient COVID-19 detection techniques: A machine learning approach. PLoS One 2022; 17:e0274538. [PMID: 36107971 PMCID: PMC9477266 DOI: 10.1371/journal.pone.0274538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/30/2022] [Indexed: 12/02/2022] Open
Abstract
The devastating impact of the Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) pandemic almost halted the global economy and is responsible for 6 million deaths with infection rates of over 524 million. With significant reservations, initially, the SARS-CoV-2 virus was suspected to be infected by and closely related to Bats. However, over the periods of learning and critical development of experimental evidence, it is found to have some similarities with several gene clusters and virus proteins identified in animal-human transmission. Despite this substantial evidence and learnings, there is limited exploration regarding the SARS-CoV-2 genome to putative microRNAs (miRNAs) in the virus life cycle. In this context, this paper presents a detection method of SARS-CoV-2 precursor-miRNAs (pre-miRNAs) that helps to identify a quick detection of specific ribonucleic acid (RNAs). The approach employs an artificial neural network and proposes a model that estimated accuracy of 98.24%. The sampling technique includes a random selection of highly unbalanced datasets for reducing class imbalance following the application of matriculation artificial neural network that includes accuracy curve, loss curve, and confusion matrix. The classical approach to machine learning is then compared with the model and its performance. The proposed approach would be beneficial in identifying the target regions of RNA and better recognising of SARS-CoV-2 genome sequence to design oligonucleotide-based drugs against the genetic structure of the virus.
Collapse
Affiliation(s)
- Md. Mahadi Hasan
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Saba Binte Murtaz
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Muhammad Usama Islam
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Muhammad Jafar Sadeq
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Jasim Uddin
- Department of Applied Computing and Engineering, Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, Wales, United Kingdom
- * E-mail:
| |
Collapse
|
2
|
Raad J, Bugnon LA, Milone DH, Stegmayer G. miRe2e: a full end-to-end deep model based on transformers for prediction of pre-miRNAs. Bioinformatics 2022; 38:1191-1197. [PMID: 34875006 DOI: 10.1093/bioinformatics/btab823] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 10/29/2021] [Accepted: 12/01/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION MicroRNAs (miRNAs) are small RNA sequences with key roles in the regulation of gene expression at post-transcriptional level in different species. Accurate prediction of novel miRNAs is needed due to their importance in many biological processes and their associations with complicated diseases in humans. Many machine learning approaches were proposed in the last decade for this purpose, but requiring handcrafted features extraction to identify possible de novo miRNAs. More recently, the emergence of deep learning (DL) has allowed the automatic feature extraction, learning relevant representations by themselves. However, the state-of-art deep models require complex pre-processing of the input sequences and prediction of their secondary structure to reach an acceptable performance. RESULTS In this work, we present miRe2e, the first full end-to-end DL model for pre-miRNA prediction. This model is based on Transformers, a neural architecture that uses attention mechanisms to infer global dependencies between inputs and outputs. It is capable of receiving the raw genome-wide data as input, without any pre-processing nor feature engineering. After a training stage with known pre-miRNAs, hairpin and non-harpin sequences, it can identify all the pre-miRNA sequences within a genome. The model has been validated through several experimental setups using the human genome, and it was compared with state-of-the-art algorithms obtaining 10 times better performance. AVAILABILITY AND IMPLEMENTATION Webdemo available at https://sinc.unl.edu.ar/web-demo/miRe2e/ and source code available for download at https://github.com/sinc-lab/miRe2e. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jonathan Raad
- Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Leandro A Bugnon
- Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
3
|
Bugnon LA, Raad J, Merino GA, Yones C, Ariel F, Milone DH, Stegmayer G. Deep Learning for the discovery of new pre-miRNAs: Helping the fight against COVID-19. MACHINE LEARNING WITH APPLICATIONS 2021; 6:100150. [PMID: 34939043 PMCID: PMC8427907 DOI: 10.1016/j.mlwa.2021.100150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/18/2021] [Accepted: 08/30/2021] [Indexed: 01/29/2023] Open
Abstract
The Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) has been recently found responsible for the pandemic outbreak of a novel coronavirus disease (COVID-19). In this work, a novel approach based on deep learning is proposed for identifying precursors of small active RNA molecules named microRNA (miRNA) in the genome of the novel coronavirus. Viral miRNA-like molecules have shown to modulate the host transcriptome during the infection progression, thus their identification is crucial for helping the diagnosis or medical treatment of the disease. The existence of the mature miRNAs derived from computationally predicted miRNA precursors (pre-miRNAs) in the novel coronavirus was validated with small RNA-seq data from SARS-CoV-2-infected human cells. The results demonstrate that computational models can provide accurate and useful predictions of pre-miRNAs in the SARS-CoV-2 genome, underscoring the relevance of machine learning in the response to a global sanitary emergency. Moreover, the interpretability of our model shed light on the molecular mechanisms underlying the viral infection, thus contributing to the fight against the COVID-19 pandemic and the fast development of new treatments. Our study shows how recent advances in machine learning can be used, effectively, in response to public health emergencies. The approach developed in this work could be of great help in future similar emergencies to accelerate the understanding of the singularities of any viral agent and for the development of novel therapies. Data and source code available at: https://sourceforge.net/projects/sourcesinc/files/aicovid/.
Collapse
Affiliation(s)
- L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| | - G A Merino
- Bioengineering and Bioinformatics Research and Development Institute (IBB), FI-UNER, CONICET, Ruta 11 km 10.5, Oro Verde, Argentina
| | - C Yones
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| | - F Ariel
- Instituto de Agrobiotecnologia del Litoral (IAL), CONICET, FBCB, Universidad Nacional del Litoral, Colectora Ruta Nacional 168 km 0, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| |
Collapse
|
4
|
Yones C, Raad J, Bugnon LA, Milone DH, Stegmayer G. High precision in microRNA prediction: A novel genome-wide approach with convolutional deep residual networks. Comput Biol Med 2021; 134:104448. [PMID: 33979731 DOI: 10.1016/j.compbiomed.2021.104448] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/30/2022]
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays and computational methods are needed for the precise prediction of novel candidates to miRNA. This task can be done by searching homologous with sequence alignment tools, but results are restricted to sequences that are very similar to the known miRNA precursors (pre-miRNAs). Besides, a very important property of pre-miRNAs, their secondary structure, is not taken into account by these methods. To fill this gap, many machine learning approaches were proposed in the last years. However, the methods are generally tested in very controlled conditions. If these methods were used under real conditions, the false positives increase and the precisions fall quite below those published. This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network (mirDNN). This model was tested with several genomes of animals and plants, the full-genomes, achieving a precision up to 5 times larger than other approaches at the same recall rates. Furthermore, a novel validation methodology was used to ensure that the performance reported in this study can be effectively achieved when using mirDNN in novel species. To provide fast an easy access to mirDNN, a web demo is available at http://sinc.unl.edu.ar/web-demo/mirdnn/. The demo can process FASTA files with multiple sequences to calculate the prediction scores and generates the nucleotide importance plots. FULL SOURCE CODE: http://sourceforge.net/projects/sourcesinc/files/mirdnn and https://github.com/cyones/mirDNN. CONTACT: gstegmayer@sinc.unl.edu.ar.
Collapse
Affiliation(s)
- C Yones
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina.
| |
Collapse
|
5
|
Bugnon LA, Yones C, Milone DH, Stegmayer G. Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning. Brief Bioinform 2020; 22:5894456. [PMID: 34020552 DOI: 10.1093/bib/bbaa184] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/13/2020] [Accepted: 07/18/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data. RESULTS In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives. AVAILABILITY The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Cristian Yones
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|