1
|
Venkatesan A, Basak J, Bahadur RP. pmiRScan: a LightGBM based method for prediction of animal pre-miRNAs. Funct Integr Genomics 2025; 25:9. [PMID: 39786653 DOI: 10.1007/s10142-025-01527-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 12/03/2024] [Accepted: 01/01/2025] [Indexed: 01/12/2025]
Abstract
MicroRNAs (miRNA) are categorized as short endogenous non-coding RNAs, which have a significant role in post-transcriptional gene regulation. Identifying new animal precursor miRNA (pre-miRNA) and miRNA is crucial to understand the role of miRNAs in various biological processes including the development of diseases. The present study focuses on the development of a Light Gradient Boost (LGB) based method for the classification of animal pre-miRNAs using various sequence and secondary structural features. In various pre-miRNA families, distinct k-mer repeat signatures with a length of three nucleotides have been identified. Out of nine different classifiers that have been trained and tested in the present study, LGB has an overall better performance with an AUROC of 0.959. In comparison with the existing methods, our method 'pmiRScan' has an overall better performance with accuracy of 0.93, sensitivity of 0.86, specificity of 0.95 and F-score of 0.82. Moreover, pmiRScan effectively classifies pre-miRNAs from four distinct taxonomic groups: mammals, nematodes, molluscs and arthropods. We have used our classifier to predict genome-wide pre-miRNAs in human. We find a total of 313 pre-miRNA candidates using pmiRScan. A total of 180 potential mature miRNAs belonging to 60 distinct miRNA families are extracted from predicted pre-miRNAs; of which 128 were novel and are note reported in miRBase. These discoveries may enhance our current understanding of miRNAs and their targets in human. pmiRScan is freely available at http://www.csb.iitkgp.ac.in/applications/pmiRScan/index.php .
Collapse
Affiliation(s)
- Amrit Venkatesan
- Computational Structural Biology Lab, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
| | - Jolly Basak
- Genomics of Plant Stress Biology Lab, Department of Biotechnology, Visva-Bharati, Santiniketan, West Bengal, 731235, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India.
- Bioinformatics Centre, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India.
| |
Collapse
|
2
|
Singh J, Khanna NN, Rout RK, Singh N, Laird JR, Singh IM, Kalra MK, Mantella LE, Johri AM, Isenovic ER, Fouda MM, Saba L, Fatemi M, Suri JS. GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides. Sci Rep 2024; 14:7154. [PMID: 38531923 PMCID: PMC11344070 DOI: 10.1038/s41598-024-56786-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint's GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.
Collapse
Affiliation(s)
- Jaskaran Singh
- Department of Computer Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - Narendra N Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, India
| | - Ranjeet K Rout
- Department of Computer Science and Engineering, NIT Srinagar, Hazratbal, Srinagar, India
| | - Narpinder Singh
- Department of Food Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA, USA
| | - Inder M Singh
- Advanced Cardiac and Vascular Institute, Sacramento, CA, USA
| | - Mannudeep K Kalra
- Department of Radiology, Massachusetts General Hospital, Boston, MA, 02115, USA
| | - Laura E Mantella
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Amer M Johri
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Esma R Isenovic
- Laboratory for Molecular Genetics and Radiobiology, University of Belgrade, Belgrade, Serbia
| | - Mostafa M Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID, 83209, USA
| | - Luca Saba
- Department of Neurology, University of Cagliari, Cagliari, Italy
| | - Mostafa Fatemi
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jasjit S Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint LLC, Roseville, CA, 95661, USA.
| |
Collapse
|
3
|
Zulian V, Fiscon G, Paci P, Garbuglia AR. Hepatitis B Virus and microRNAs: A Bioinformatics Approach. Int J Mol Sci 2023; 24:17224. [PMID: 38139051 PMCID: PMC10743825 DOI: 10.3390/ijms242417224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 11/20/2023] [Accepted: 12/05/2023] [Indexed: 12/24/2023] Open
Abstract
In recent decades, microRNAs (miRNAs) have emerged as key regulators of gene expression, and the identification of viral miRNAs (v-miRNAs) within some viruses, including hepatitis B virus (HBV), has attracted significant attention. HBV infections often progress to chronic states (CHB) and may induce fibrosis/cirrhosis and hepatocellular carcinoma (HCC). The presence of HBV can dysregulate host miRNA expression, influencing several biological pathways, such as apoptosis, innate and immune response, viral replication, and pathogenesis. Consequently, miRNAs are considered a promising biomarker for diagnostic, prognostic, and treatment response. The dynamics of miRNAs during HBV infection are multifaceted, influenced by host variability and miRNA interactions. Given the ability of miRNAs to target multiple messenger RNA (mRNA), understanding the viral-host (human) interplay is complex but essential to develop novel clinical applications. Therefore, bioinformatics can help to analyze, identify, and interpret a vast amount of miRNA data. This review explores the bioinformatics tools available for viral and host miRNA research. Moreover, we introduce a brief overview focusing on the role of miRNAs during HBV infection. In this way, this review aims to help the selection of the most appropriate bioinformatics tools based on requirements and research goals.
Collapse
Affiliation(s)
- Verdiana Zulian
- Virology Laboratory, National Institute for Infectious Diseases “Lazzaro Spallanzani” IRCCS, 00149 Rome, Italy;
| | - Giulia Fiscon
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, 00185 Rome, Italy; (G.F.); (P.P.)
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council, 00185 Rome, Italy
| | - Paola Paci
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, 00185 Rome, Italy; (G.F.); (P.P.)
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council, 00185 Rome, Italy
| | - Anna Rosa Garbuglia
- Virology Laboratory, National Institute for Infectious Diseases “Lazzaro Spallanzani” IRCCS, 00149 Rome, Italy;
| |
Collapse
|
4
|
Dobrzycka M, Sulewska A, Biecek P, Charkiewicz R, Karabowicz P, Charkiewicz A, Golaszewska K, Milewska P, Michalska-Falkowska A, Nowak K, Niklinski J, Konopińska J. miRNA Studies in Glaucoma: A Comprehensive Review of Current Knowledge and Future Perspectives. Int J Mol Sci 2023; 24:14699. [PMID: 37834147 PMCID: PMC10572595 DOI: 10.3390/ijms241914699] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 09/25/2023] [Accepted: 09/27/2023] [Indexed: 10/15/2023] Open
Abstract
Glaucoma, a neurodegenerative disorder that leads to irreversible blindness, remains a challenge because of its complex nature. MicroRNAs (miRNAs) are crucial regulators of gene expression and are associated with glaucoma and other diseases. We aimed to review and discuss the advantages and disadvantages of miRNA-focused molecular studies in glaucoma through discussing their potential as biomarkers for early detection and diagnosis; offering insights into molecular pathways and mechanisms; and discussing their potential utility with respect to personalized medicine, their therapeutic potential, and non-invasive monitoring. Limitations, such as variability, small sample sizes, sample specificity, and limited accessibility to ocular tissues, are also addressed, underscoring the need for robust protocols and collaboration. Reproducibility and validation are crucial to establish the credibility of miRNA research findings, and the integration of bioinformatics tools for miRNA database creation is a valuable component of a comprehensive approach to investigate miRNA aberrations in patients with glaucoma. Overall, miRNA research in glaucoma has provided significant insights into the molecular mechanisms of the disease, offering potential biomarkers, diagnostic tools, and therapeutic targets. However, addressing challenges such as variability and limited tissue accessibility is essential, and further investigations and validation will contribute to a deeper understanding of the functional significance of miRNAs in glaucoma.
Collapse
Affiliation(s)
- Margarita Dobrzycka
- Department of Ophthalmology, Medical University of Bialystok, 15-276 Bialystok, Poland; (M.D.); (K.G.)
| | - Anetta Sulewska
- Department of Clinical Molecular Biology, Medical University of Bialystok, 15-269 Bialystok, Poland; (A.S.); (A.C.); (J.N.)
| | - Przemyslaw Biecek
- Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland;
| | - Radoslaw Charkiewicz
- Center of Experimental Medicine, Medical University of Bialystok, 15-369 Bialystok, Poland;
- Biobank, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.K.); (P.M.); (A.M.-F.)
| | - Piotr Karabowicz
- Biobank, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.K.); (P.M.); (A.M.-F.)
| | - Angelika Charkiewicz
- Department of Clinical Molecular Biology, Medical University of Bialystok, 15-269 Bialystok, Poland; (A.S.); (A.C.); (J.N.)
| | - Kinga Golaszewska
- Department of Ophthalmology, Medical University of Bialystok, 15-276 Bialystok, Poland; (M.D.); (K.G.)
| | - Patrycja Milewska
- Biobank, Medical University of Bialystok, 15-269 Bialystok, Poland; (P.K.); (P.M.); (A.M.-F.)
| | | | - Karolina Nowak
- Department of Obstetrics and Gynecology, C.S. Mott Center for Human Growth and Development, School of Medicine, Wayne State University, Detroit, MI 48201, USA;
| | - Jacek Niklinski
- Department of Clinical Molecular Biology, Medical University of Bialystok, 15-269 Bialystok, Poland; (A.S.); (A.C.); (J.N.)
| | - Joanna Konopińska
- Department of Ophthalmology, Medical University of Bialystok, 15-276 Bialystok, Poland; (M.D.); (K.G.)
| |
Collapse
|
5
|
Umu SU, Paynter VM, Trondsen H, Buschmann T, Rounge TB, Peterson KJ, Fromm B. Accurate microRNA annotation of animal genomes using trained covariance models of curated microRNA complements in MirMachine. CELL GENOMICS 2023; 3:100348. [PMID: 37601971 PMCID: PMC10435380 DOI: 10.1016/j.xgen.2023.100348] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 03/15/2023] [Accepted: 05/26/2023] [Indexed: 08/22/2023]
Abstract
The annotation of microRNAs depends on the availability of transcriptomics data and expert knowledge. This has led to a gap between the availability of novel genomes and high-quality microRNA complements. Using >16,000 microRNAs from the manually curated microRNA gene database MirGeneDB, we generated trained covariance models for all conserved microRNA families. These models are available in our tool MirMachine, which annotates conserved microRNAs within genomes. We successfully applied MirMachine to a range of animal species, including those with large genomes and genome duplications and extinct species, where small RNA sequencing is hard to achieve. We further describe a microRNA score of expected microRNAs that can be used to assess the completeness of genome assemblies. MirMachine closes a long-persisting gap in the microRNA field by facilitating automated genome annotation pipelines and deeper studies into the evolution of genome regulation, even in extinct organisms.
Collapse
Affiliation(s)
- Sinan Uğur Umu
- Department of Pathology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Vanessa M. Paynter
- The Arctic University Museum of Norway, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Håvard Trondsen
- Department of Pathology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | | | - Trine B. Rounge
- Department of Research, Cancer Registry of Norway, Oslo, Norway
- Centre for Bioinformatics, Department of Pharmacy, University of Oslo, Oslo, Norway
| | - Kevin J. Peterson
- Department of Biological Sciences, Dartmouth College, Hanover, NH, USA
| | - Bastian Fromm
- The Arctic University Museum of Norway, UiT - The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
6
|
Xu P, Chang J, Ma G, Liao F, Xu T, Wu Y, Yin Z. MiR-145 inhibits the differentiation and proliferation of bone marrow stromal mesenchymal stem cells by GABARAPL1 in steroid-induced femoral head necrosis. BMC Musculoskelet Disord 2022; 23:1020. [PMID: 36435763 PMCID: PMC9701430 DOI: 10.1186/s12891-022-05928-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 10/25/2022] [Accepted: 10/29/2022] [Indexed: 11/28/2022] Open
Abstract
Steroid-induced osteonecrosis of femoral head (SANFH) involves impaired differentiation of bone marrow mesenchymal stem cells (BMSC), the mechanism of which is regulated by multiple microRNAs. Studies have shown that miR-145 is a key regulatory molecule of BMSC cells, but its mechanism in steroid-induced femur head necrosis remains unclear. The present study mainly explored the specific mechanism of miR-145 involved in SANFH. In this study dexamethasone, a typical glucocorticoid, was used to induce osteogenic differentiation of BMSC cells. Western blot, qPCR, CCK8 and flow cytometry were used to investigate the effects of miR-145 on the proliferation and differentiation of BMSC. The relationship between miR-145 and GABA Type A Receptor Associated Protein Like 1(GABARAPL1) was identified using dual luciferase reports and the effects of the two molecules on BMSC were investigated in vitro. The results showed that miR-145 was up-regulated in SANFH patients, while GABARAPL1 was down-regulated. Inhibition of miR-145 can improve apoptosis and promote proliferation and activation of BMSC. GABARAPL1 is a downstream target gene of miR-145 and is negatively regulated by miR-145. In conclusion, miR-145 regulates the proliferation and differentiation of glucocorticoid-induced BMSC cells through GABARAPL1 and pharmacologically inhibit targeting miR-145 may provide new aspect for the treatment of SANFH.
Collapse
|
7
|
Hasan MM, Murtaz SB, Islam MU, Sadeq MJ, Uddin J. Robust and efficient COVID-19 detection techniques: A machine learning approach. PLoS One 2022; 17:e0274538. [PMID: 36107971 PMCID: PMC9477266 DOI: 10.1371/journal.pone.0274538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/30/2022] [Indexed: 12/02/2022] Open
Abstract
The devastating impact of the Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) pandemic almost halted the global economy and is responsible for 6 million deaths with infection rates of over 524 million. With significant reservations, initially, the SARS-CoV-2 virus was suspected to be infected by and closely related to Bats. However, over the periods of learning and critical development of experimental evidence, it is found to have some similarities with several gene clusters and virus proteins identified in animal-human transmission. Despite this substantial evidence and learnings, there is limited exploration regarding the SARS-CoV-2 genome to putative microRNAs (miRNAs) in the virus life cycle. In this context, this paper presents a detection method of SARS-CoV-2 precursor-miRNAs (pre-miRNAs) that helps to identify a quick detection of specific ribonucleic acid (RNAs). The approach employs an artificial neural network and proposes a model that estimated accuracy of 98.24%. The sampling technique includes a random selection of highly unbalanced datasets for reducing class imbalance following the application of matriculation artificial neural network that includes accuracy curve, loss curve, and confusion matrix. The classical approach to machine learning is then compared with the model and its performance. The proposed approach would be beneficial in identifying the target regions of RNA and better recognising of SARS-CoV-2 genome sequence to design oligonucleotide-based drugs against the genetic structure of the virus.
Collapse
Affiliation(s)
- Md. Mahadi Hasan
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Saba Binte Murtaz
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Muhammad Usama Islam
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Muhammad Jafar Sadeq
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Jasim Uddin
- Department of Applied Computing and Engineering, Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, Wales, United Kingdom
- * E-mail:
| |
Collapse
|
8
|
Zhang T, Zhai J, Zhang X, Ling L, Li M, Xie S, Song M, Ma C. Interactive Web-based Annotation of Plant MicroRNAs with iwa-miRNA. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:557-567. [PMID: 34332120 PMCID: PMC9801042 DOI: 10.1016/j.gpb.2021.02.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 12/15/2020] [Accepted: 03/06/2021] [Indexed: 01/26/2023]
Abstract
MicroRNAs (miRNAs) are important regulators of gene expression. The large-scale detection and profiling of miRNAs have been accelerated with the development of high-throughput small RNA sequencing (sRNA-Seq) techniques and bioinformatics tools. However, generating high-quality comprehensive miRNA annotations remains challenging due to the intrinsic complexity of sRNA-Seq data and inherent limitations of existing miRNA prediction tools. Here, we present iwa-miRNA, a Galaxy-based framework that can facilitate miRNA annotation in plant species by combining computational analysis and manual curation. iwa-miRNA is specifically designed to generate a comprehensive list of miRNA candidates, bridging the gap between already annotated miRNAs provided by public miRNA databases and new predictions from sRNA-Seq datasets. It can also assist users in selecting promising miRNA candidates in an interactive mode, contributing to the accessibility and reproducibility of genome-wide miRNA annotation. iwa-miRNA is user-friendly and can be easily deployed as a web application for researchers without programming experience. With flexible, interactive, and easy-to-use features, iwa-miRNA is a valuable tool for the annotation of miRNAs in plant species with reference genomes. We also illustrate the application of iwa-miRNA for miRNA annotation using data from plant species with varying genomic complexity. The source codes and web server of iwa-miRNA are freely accessible at http://iwa-miRNA.omicstudio.cloud/.
Collapse
Affiliation(s)
- Ting Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling 712100, China,Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture and Rural Affairs, Northwest A&F University, Yangling 712100, China
| | - Jingjing Zhai
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling 712100, China,Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture and Rural Affairs, Northwest A&F University, Yangling 712100, China
| | - Xiaorong Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling 712100, China,Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture and Rural Affairs, Northwest A&F University, Yangling 712100, China
| | - Lei Ling
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling 712100, China,Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture and Rural Affairs, Northwest A&F University, Yangling 712100, China
| | - Menghan Li
- College of Plant Science, Tibet Agricultural and Animal Husbandry University, Linzhi 860006, China
| | - Shang Xie
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling 712100, China,Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture and Rural Affairs, Northwest A&F University, Yangling 712100, China
| | - Minggui Song
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Chuang Ma
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling 712100, China,Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture and Rural Affairs, Northwest A&F University, Yangling 712100, China,Corresponding author.
| |
Collapse
|
9
|
Luo F, Liu W, Bu H. MicroRNAs in hypertrophic cardiomyopathy: pathogenesis, diagnosis, treatment potential and roles as clinical biomarkers. Heart Fail Rev 2022; 27:2211-2221. [PMID: 35332416 DOI: 10.1007/s10741-022-10231-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/15/2022] [Indexed: 12/28/2022]
Abstract
Hypertrophic cardiomyopathy (HCM) is the most common heritable cardiomyopathy and is characterized by increased left ventricular wall thickness, but existing diagnostic and treatment approaches face limitations. MicroRNAs (miRNAs) are type of noncoding RNA molecule that plays crucial roles in the pathological process of cardiac remodelling. Accordingly, miRNAs related to HCM may represent potential novel therapeutic targets. In this review, we first discuss the different roles of miRNAs in the development of HCM. We then summarize the roles of common miRNAs as diagnostic and clinical biomarkers in HCM. Finally, we outline current and future challenges and potential new directions for miRNA-based therapeutics for HCM.
Collapse
Affiliation(s)
- Fanyan Luo
- The Department of Cardiovascular Surgery, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan, 410008, People's Republic of China.,National Clinical Research Centre for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Wei Liu
- The Department of Cardiovascular Surgery, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan, 410008, People's Republic of China.,National Clinical Research Centre for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Haisong Bu
- The Department of Cardiovascular Surgery, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan, 410008, People's Republic of China. .,National Clinical Research Centre for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China.
| |
Collapse
|
10
|
Raad J, Bugnon LA, Milone DH, Stegmayer G. miRe2e: a full end-to-end deep model based on transformers for prediction of pre-miRNAs. Bioinformatics 2022; 38:1191-1197. [PMID: 34875006 DOI: 10.1093/bioinformatics/btab823] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 10/29/2021] [Accepted: 12/01/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION MicroRNAs (miRNAs) are small RNA sequences with key roles in the regulation of gene expression at post-transcriptional level in different species. Accurate prediction of novel miRNAs is needed due to their importance in many biological processes and their associations with complicated diseases in humans. Many machine learning approaches were proposed in the last decade for this purpose, but requiring handcrafted features extraction to identify possible de novo miRNAs. More recently, the emergence of deep learning (DL) has allowed the automatic feature extraction, learning relevant representations by themselves. However, the state-of-art deep models require complex pre-processing of the input sequences and prediction of their secondary structure to reach an acceptable performance. RESULTS In this work, we present miRe2e, the first full end-to-end DL model for pre-miRNA prediction. This model is based on Transformers, a neural architecture that uses attention mechanisms to infer global dependencies between inputs and outputs. It is capable of receiving the raw genome-wide data as input, without any pre-processing nor feature engineering. After a training stage with known pre-miRNAs, hairpin and non-harpin sequences, it can identify all the pre-miRNA sequences within a genome. The model has been validated through several experimental setups using the human genome, and it was compared with state-of-the-art algorithms obtaining 10 times better performance. AVAILABILITY AND IMPLEMENTATION Webdemo available at https://sinc.unl.edu.ar/web-demo/miRe2e/ and source code available for download at https://github.com/sinc-lab/miRe2e. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jonathan Raad
- Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Leandro A Bugnon
- Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
11
|
Abstract
In this era of big data, sets of methodologies and strategies are designed to extract knowledge from huge volumes of data. However, the cost of where and how to get this information accurately and quickly is extremely important, given the diversity of genomes and the different ways of representing that information. Among the huge set of information and relationships that the genome carries, there are sequences called miRNAs (microRNAs). These sequences were described in the 1990s and are mainly involved in mechanisms of regulation and gene expression. Having this in mind, this chapter focuses on exploring the available literature and providing useful and practical guidance on the miRNA database and tools topic. For that, we organized and present this text in two ways: (a) the update reviews and articles, which best summarize and discuss the theme; and (b) our update investigation on miRNA literature and portals about databases and tools. Finally, we present the main challenge and a possible solution to improve resources and tools.
Collapse
Affiliation(s)
- Tharcísio Soares de Amorim
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil
| | - Daniel Longhi Fernandes Pedro
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil.
| |
Collapse
|
12
|
Lai X, Schmitz U, Vera J. The Role of MicroRNAs in Cancer Biology and Therapy from a Systems Biology Perspective. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1385:1-22. [DOI: 10.1007/978-3-031-08356-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
13
|
Bugnon LA, Raad J, Merino GA, Yones C, Ariel F, Milone DH, Stegmayer G. Deep Learning for the discovery of new pre-miRNAs: Helping the fight against COVID-19. MACHINE LEARNING WITH APPLICATIONS 2021; 6:100150. [PMID: 34939043 PMCID: PMC8427907 DOI: 10.1016/j.mlwa.2021.100150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/18/2021] [Accepted: 08/30/2021] [Indexed: 01/29/2023] Open
Abstract
The Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) has been recently found responsible for the pandemic outbreak of a novel coronavirus disease (COVID-19). In this work, a novel approach based on deep learning is proposed for identifying precursors of small active RNA molecules named microRNA (miRNA) in the genome of the novel coronavirus. Viral miRNA-like molecules have shown to modulate the host transcriptome during the infection progression, thus their identification is crucial for helping the diagnosis or medical treatment of the disease. The existence of the mature miRNAs derived from computationally predicted miRNA precursors (pre-miRNAs) in the novel coronavirus was validated with small RNA-seq data from SARS-CoV-2-infected human cells. The results demonstrate that computational models can provide accurate and useful predictions of pre-miRNAs in the SARS-CoV-2 genome, underscoring the relevance of machine learning in the response to a global sanitary emergency. Moreover, the interpretability of our model shed light on the molecular mechanisms underlying the viral infection, thus contributing to the fight against the COVID-19 pandemic and the fast development of new treatments. Our study shows how recent advances in machine learning can be used, effectively, in response to public health emergencies. The approach developed in this work could be of great help in future similar emergencies to accelerate the understanding of the singularities of any viral agent and for the development of novel therapies. Data and source code available at: https://sourceforge.net/projects/sourcesinc/files/aicovid/.
Collapse
Affiliation(s)
- L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| | - G A Merino
- Bioengineering and Bioinformatics Research and Development Institute (IBB), FI-UNER, CONICET, Ruta 11 km 10.5, Oro Verde, Argentina
| | - C Yones
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| | - F Ariel
- Instituto de Agrobiotecnologia del Litoral (IAL), CONICET, FBCB, Universidad Nacional del Litoral, Colectora Ruta Nacional 168 km 0, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe, Argentina
| |
Collapse
|
14
|
Zhou H, Tang W, Yang J, Peng J, Guo J, Fan C. MicroRNA-Related Strategies to Improve Cardiac Function in Heart Failure. Front Cardiovasc Med 2021; 8:773083. [PMID: 34869689 PMCID: PMC8639862 DOI: 10.3389/fcvm.2021.773083] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 10/25/2021] [Indexed: 12/18/2022] Open
Abstract
Heart failure (HF) describes a group of manifestations caused by the failure of heart function as a pump that supports blood flow through the body. MicroRNAs (miRNAs), as one type of non-coding RNA molecule, have crucial roles in the etiology of HF. Accordingly, miRNAs related to HF may represent potential novel therapeutic targets. In this review, we first discuss the different roles of miRNAs in the development and diseases of the heart. We then outline commonly used miRNA chemical modifications and delivery systems. Further, we summarize the opportunities and challenges for HF-related miRNA therapeutics targets, and discuss the first clinical trial of an antisense drug (CDR132L) in patients with HF. Finally, we outline current and future challenges and potential new directions for miRNA-based therapeutics for HF.
Collapse
Affiliation(s)
- Huatao Zhou
- Department of Cardiovascular Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Weijie Tang
- Department of Cardiovascular Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Jinfu Yang
- Department of Cardiovascular Surgery, The Second Xiangya Hospital, Central South University, Changsha, China.,Department of Pharmacology, Hunan Provincial Key Laboratory of Cardiovascular Research, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China
| | - Jun Peng
- Department of Pharmacology, Hunan Provincial Key Laboratory of Cardiovascular Research, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China
| | - Jianjun Guo
- Hunan Fangsheng Pharmaceutical Co., Ltd. Changsha, China
| | - Chengming Fan
- Department of Cardiovascular Surgery, The Second Xiangya Hospital, Central South University, Changsha, China.,Department of Pharmacology, Hunan Provincial Key Laboratory of Cardiovascular Research, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China.,Hunan Fangsheng Pharmaceutical Co., Ltd. Changsha, China
| |
Collapse
|
15
|
Patil S, Joshi S, Jamla M, Zhou X, Taherzadeh MJ, Suprasanna P, Kumar V. MicroRNA-mediated bioengineering for climate-resilience in crops. Bioengineered 2021; 12:10430-10456. [PMID: 34747296 PMCID: PMC8815627 DOI: 10.1080/21655979.2021.1997244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/24/2022] Open
Abstract
Global projections on the climate change and the dynamic environmental perturbations indicate severe impacts on food security in general, and crop yield, vigor and the quality of produce in particular. Sessile plants respond to environmental challenges such as salt, drought, temperature, heavy metals at transcriptional and/or post-transcriptional levels through the stress-regulated network of pathways including transcription factors, proteins and the small non-coding endogenous RNAs. Amongs these, the miRNAs have gained unprecedented attention in recent years as key regulators for modulating gene expression in plants under stress. Hence, tailoring of miRNAs and their target pathways presents a promising strategy for developing multiple stress-tolerant crops. Plant stress tolerance has been successfully achieved through the over expression of microRNAs such as Os-miR408, Hv-miR82 for drought tolerance; OsmiR535A and artificial DST miRNA for salinity tolerance; and OsmiR535 and miR156 for combined drought and salt stress. Examples of miR408 overexpression also showed improved efficiency of irradiation utilization and carbon dioxide fixation in crop plants. Through this review, we present the current understanding about plant miRNAs, their roles in plant growth and stress-responses, the modern toolbox for identification, characterization and validation of miRNAs and their target genes including in silico tools, machine learning and artificial intelligence. Various approaches for up-regulation or knock-out of miRNAs have been discussed. The main emphasis has been given to the exploration of miRNAs for development of bioengineered climate-smart crops that can withstand changing climates and stressful environments, including combination of stresses, with very less or no yield penalties.
Collapse
Affiliation(s)
- Suraj Patil
- Department of Biotechnology, Modern College of Arts, Science and Commerce, Savitribai Phule Pune University, Pune, India
| | - Shrushti Joshi
- Department of Biotechnology, Modern College of Arts, Science and Commerce, Savitribai Phule Pune University, Pune, India
| | - Monica Jamla
- Department of Biotechnology, Modern College of Arts, Science and Commerce, Savitribai Phule Pune University, Pune, India
| | - Xianrong Zhou
- School of Life Science and Biotechnology, Yangtze Normal University, Ch-ongqing, China
| | | | - Penna Suprasanna
- Bhabha Atomic Research Centre, Homi Bhabha National Institute, Mumbai, India
| | - Vinay Kumar
- Department of Biotechnology, Modern College of Arts, Science and Commerce, Savitribai Phule Pune University, Pune, India
| |
Collapse
|
16
|
Su R, Hu J, Zou Q, Manavalan B, Wei L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform 2021; 21:408-420. [PMID: 30649170 DOI: 10.1093/bib/bby124] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 11/30/2018] [Accepted: 11/30/2018] [Indexed: 12/16/2022] Open
Abstract
Cell-penetrating peptides (CPPs) facilitate the delivery of therapeutically relevant molecules, including DNA, proteins and oligonucleotides, into cells both in vitro and in vivo. This unique ability explores the possibility of CPPs as therapeutic delivery and its potential applications in clinical therapy. Over the last few decades, a number of machine learning (ML)-based prediction tools have been developed, and some of them are freely available as web portals. However, the predictions produced by various tools are difficult to quantify and compare. In particular, there is no systematic comparison of the web-based prediction tools in performance, especially in practical applications. In this work, we provide a comprehensive review on the biological importance of CPPs, CPP database and existing ML-based methods for CPP prediction. To evaluate current prediction tools, we conducted a comparative study and analyzed a total of 12 models from 6 publicly available CPP prediction tools on 2 benchmark validation sets of CPPs and non-CPPs. Our benchmarking results demonstrated that a model from the KELM-CPPpred, namely KELM-hybrid-AAC, showed a significant improvement in overall performance, when compared to the other 11 prediction models. Moreover, through a length-dependency analysis, we find that existing prediction tools tend to more accurately predict CPPs and non-CPPs with the length of 20-25 residues long than peptides in other length ranges.
Collapse
Affiliation(s)
- Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jie Hu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | | | - Leyi Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
17
|
Merino GA, Raad J, Bugnon LA, Yones C, Kamenetzky L, Claus J, Ariel F, Milone DH, Stegmayer G. Novel SARS-CoV-2 encoded small RNAs in the passage to humans. Bioinformatics 2021; 36:5571-5581. [PMID: 33244583 PMCID: PMC7717134 DOI: 10.1093/bioinformatics/btaa1002] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 10/15/2020] [Accepted: 11/18/2020] [Indexed: 12/14/2022] Open
Abstract
Motivation The Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) has recently emerged as the responsible for the pandemic outbreak of the coronavirus disease (COVID-19). This virus is closely related to coronaviruses infecting bats and Malayan pangolins, species suspected to be an intermediate host in the passage to humans. Several genomic mutations affecting viral proteins have been identified, contributing to the understanding of the recent animal-to-human transmission. However, the capacity of SARS-CoV-2 to encode functional putative microRNAs (miRNAs) remains largely unexplored. Results We have used deep learning to discover 12 candidate stem-loop structures hidden in the viral protein-coding genome. Among the precursors, the expression of eight mature miRNAs-like sequences was confirmed in small RNA-seq data from SARS-CoV-2 infected human cells. Predicted miRNAs are likely to target a subset of human genes of which 109 are transcriptionally deregulated upon infection. Remarkably, 28 of those genes potentially targeted by SARS-CoV-2 miRNAs are down-regulated in infected human cells. Interestingly, most of them have been related to respiratory diseases and viral infection, including several afflictions previously associated with SARS-CoV-1 and SARS-CoV-2. The comparison of SARS-CoV-2 pre-miRNA sequences with those from bat and pangolin coronaviruses suggests that single nucleotide mutations could have helped its progenitors jumping inter-species boundaries, allowing the gain of novel mature miRNAs targeting human mRNAs. Our results suggest that the recent acquisition of novel miRNAs-like sequences in the SARS-CoV-2 genome may have contributed to modulate the transcriptional reprogramming of the new host upon infection.
Collapse
Affiliation(s)
- Gabriela A Merino
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina.,Bioengineering and Bioinformatics Research and Development Institute (IBB), FI-UNER, CONICET, Entre Ríos 3100, Argentina.,European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire CB101SD, UK
| | - Jonathan Raad
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Cristian Yones
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Laura Kamenetzky
- Instituto de Investigaciones en Microbiología y Parasitología Médica (IMPaM), Facultad de Medicina, UBA-CONICET, Ciudad Autónoma de Buenos Aires 1121, Argentina.,Laboratorio de Genómica y Bioinformática de Patógenos, iB3, Instituto de Biociencias, Biotecnología y Biología traslacional, Departamento de Fisiología y Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires 1121, Argentina
| | - Juan Claus
- Laboratorio de Virología, FBCB, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Federico Ariel
- Instituto de Agrobiotecnología del Litoral (IAL), CONICET, FBCB, Universidad Nacional del Litoral, Santa Fe 3000, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| |
Collapse
|
18
|
Zhao Y, Kuang Z, Wang Y, Li L, Yang X. MicroRNA annotation in plants: current status and challenges. Brief Bioinform 2021; 22:6180404. [PMID: 33754625 DOI: 10.1093/bib/bbab075] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 02/01/2021] [Accepted: 02/15/2021] [Indexed: 11/14/2022] Open
Abstract
Last two decades, the studies on microRNAs (miRNAs) and the numbers of annotated miRNAs in plants and animals have surged. Herein, we reviewed the current progress and challenges of miRNA annotation in plants. Via the comparison of plant and animal miRNAs, we pinpointed out the difficulties on plant miRNA annotation and proposed potential solutions. In terms of recalling the history of methods and criteria in plant miRNA annotation, we detailed how the major progresses made and evolved. By collecting and categorizing bioinformatics tools for plant miRNA annotation, we surveyed their advantages and disadvantages, especially for ones with the principle of mimicking the miRNA biogenesis pathway by parsing deeply sequenced small RNA (sRNA) libraries. In addition, we summarized all available databases hosting plant miRNAs, and posted the potential optimization solutions such as how to increase the signal-to-noise ratio (SNR) in these databases. Finally, we discussed the challenges and perspectives of plant miRNA annotations, and indicated the possibilities offered by an all-in-one tool and platform according to the integration of artificial intelligence.
Collapse
Affiliation(s)
- Yongxin Zhao
- Beijing Academy of Agriculture and Forestry Sciences, China
| | - Zheng Kuang
- Peking University and Beijing Academy of Agriculture and Forestry Sciences, China
| | | | - Lei Li
- School of Advanced Agricultural Sciences and School of Life Sciences at the Peking University, China
| | - Xiaozeng Yang
- Beijing Academy of Agriculture and Forestry Sciences, China
| |
Collapse
|
19
|
Bonidia RP, Sampaio LDH, Domingues DS, Paschoal AR, Lopes FM, de Carvalho ACPLF, Sanches DS. Feature extraction approaches for biological sequences: a comparative study of mathematical features. Brief Bioinform 2021; 22:6135010. [PMID: 33585910 DOI: 10.1093/bib/bbab011] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/13/2020] [Accepted: 01/07/2021] [Indexed: 11/14/2022] Open
Abstract
As consequence of the various genomic sequencing projects, an increasing volume of biological sequence data is being produced. Although machine learning algorithms have been successfully applied to a large number of genomic sequence-related problems, the results are largely affected by the type and number of features extracted. This effect has motivated new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes a new study of feature extraction approaches based on mathematical features (numerical mapping with Fourier, entropy and complex networks). As a case study, we analyze long non-coding RNA sequences. Moreover, we separated this work into three studies. First, we assessed our proposal with the most addressed problem in our review, e.g. lncRNA and mRNA; second, we also validate the mathematical features in different classification problems, to predict the class of lncRNA, e.g. circular RNAs sequences; third, we analyze its robustness in scenarios with imbalanced data. The experimental results demonstrated three main contributions: first, an in-depth study of several mathematical features; second, a new feature extraction pipeline; and third, its high performance and robustness for distinct RNA sequence classification. Availability: https://github.com/Bonidia/FeatureExtraction_BiologicalSequences.
Collapse
Affiliation(s)
- Robson P Bonidia
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil.,Institute of Mathematics and Computer Sciences, University of São Paulo - USP, São Carlos, 13566-590, Brazil
| | - Lucas D H Sampaio
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - Douglas S Domingues
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil.,Department of Botany, Institute of Biosciences, São Paulo State University (UNESP), Rio Claro 13506-900, Brazil
| | - Alexandre R Paschoal
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - Fabrício M Lopes
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo - USP, São Carlos, 13566-590, Brazil
| | - Danilo S Sanches
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| |
Collapse
|
20
|
Solomon J, Kern F, Fehlmann T, Meese E, Keller A. HumiR: Web Services, Tools and Databases for Exploring Human microRNA Data. Biomolecules 2020; 10:biom10111576. [PMID: 33233537 PMCID: PMC7699549 DOI: 10.3390/biom10111576] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 11/13/2020] [Accepted: 11/17/2020] [Indexed: 12/29/2022] Open
Abstract
For many research aspects on small non-coding RNAs, especially microRNAs, computational tools and databases are developed. This includes quantification of miRNAs, piRNAs, tRNAs and tRNA fragments, circRNAs and others. Furthermore, the prediction of new miRNAs, isomiRs, arm switch events, target and target pathway prediction and miRNA pathway enrichment are common tasks. Additionally, databases and resources containing expression profiles, e.g., from different tissues, organs or cell types, are generated. This information in turn leads to improved miRNA repositories. While most of the respective tools are implemented in a species-independent manner, we focused on tools for human small non-coding RNAs. This includes four aspects: (1) miRNA analysis tools (2) databases on miRNAs and variations thereof (3) databases on expression profiles (4) miRNA helper tools facilitating frequent tasks such as naming conversion or reporter assay design. Although dependencies between the tools exist and several tools are jointly used in studies, the interoperability is limited. We present HumiR, a joint web presence for our tools. HumiR facilitates an entry in the world of miRNA research, supports the selection of the right tool for a research task and represents the very first step towards a fully integrated knowledge-base for human small non-coding RNA research. We demonstrate the utility of HumiR by performing a very comprehensive analysis of Alzheimer's miRNAs.
Collapse
Affiliation(s)
- Jeffrey Solomon
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany; (J.S.); (F.K.); (T.F.)
| | - Fabian Kern
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany; (J.S.); (F.K.); (T.F.)
| | - Tobias Fehlmann
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany; (J.S.); (F.K.); (T.F.)
| | - Eckart Meese
- Institute for Human Genetics, Saarland University, 66421 Homburg, Germany;
| | - Andreas Keller
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany; (J.S.); (F.K.); (T.F.)
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Department of Neurobiology, Stanford University, Palo Alto, CA 94305, USA
- Correspondence: ; Tel.: +49-681-30268611
| |
Collapse
|
21
|
Popular Computational Tools Used for miRNA Prediction and Their Future Development Prospects. Interdiscip Sci 2020; 12:395-413. [PMID: 32959233 DOI: 10.1007/s12539-020-00387-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 10/23/2022]
Abstract
MicroRNAs (miRNAs) are 19-24 nucleotide (nt)-long noncoding, single-stranded RNA molecules that play significant roles in regulating the gene expression, growth, and development of plants and animals. From the year that miRNAs were first discovered until the beginning of the twenty-first century, researchers used experimental methods such as cloning and sequencing to identify new miRNAs and their roles in the posttranscriptional regulation of protein synthesis. Later, in the early 2000s, informatics approaches to the discovery of new miRNAs began to be implemented. With increasing knowledge about miRNA, more efficient algorithms have been developed for computational miRNA prediction. The miRNA research community, hoping for greater coverage and faster results, has shifted from cumbersome and expensive traditional experimental approaches to computational approaches. These computational methods started with homology-based comparisons of known miRNAs with orthologs in the genomes of other species; this method could identify a known miRNA in new species. Second-generation sequencing and next-generation sequencing of mRNA at different developmental stages and in specific tissues, in combination with a better search and alignment algorithm, have accelerated the process of predicting novel miRNAs in a particular species. Using the accumulated annotated miRNA sequence information, researchers have been able to design ab initio algorithms for miRNA prediction independent of genome sequence knowledge. Here, the methods recently used for miRNA computational prediction are summarized and classified into the following four categories: homology-based, target-based, scoring-based, and machine-learning-based approaches. Finally, the future developmental directions of miRNA prediction methods are discussed.
Collapse
|
22
|
Bugnon LA, Yones C, Milone DH, Stegmayer G. Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning. Brief Bioinform 2020; 22:5894456. [PMID: 34020552 DOI: 10.1093/bib/bbaa184] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/13/2020] [Accepted: 07/18/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data. RESULTS In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives. AVAILABILITY The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Cristian Yones
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
23
|
Song X, Zhuang Y, Lan Y, Lin Y, Min X. Comprehensive Review and Comparison for Anticancer Peptides Identification Models. Curr Protein Pept Sci 2020; 22:CPPS-EPUB-103745. [PMID: 31957608 DOI: 10.2174/1389203721666200117162958] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 05/16/2019] [Accepted: 05/30/2019] [Indexed: 11/22/2022]
Abstract
Anticancer peptides (ACPs) eliminate pathogenic bacteria and kill tumor cells, showing no hemolysis and no damages to normal human cells. This unique ability explores the possibility of ACPs as therapeutic delivery and its potential applications in clinical therapy. Identifying ACPs is one of the most fundamental and central problems in new antitumor drug research. During the past decades, a number of machine learning-based prediction tools have been developed to solve this important task. However, the predictions produced by various tools are difficult to quantify and compare. Therefore, in this article, we provide a comprehensive review of existing machine learning methods for ACPs prediction and fair comparison of the predictors. To evaluate current prediction tools, we conducted a comparative study and analyzed the existing ACPs predictor from 10 public literatures. The comparative results obtained suggest that Support Vector Machine-based model with features combination provided significant improvement in the overall performance, when compared to the other machine learning method-based prediction models.
Collapse
|
24
|
Tang X, Sun Y. Fast and accurate microRNA search using CNN. BMC Bioinformatics 2019; 20:646. [PMID: 31881831 PMCID: PMC6933638 DOI: 10.1186/s12859-019-3279-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Accepted: 11/18/2019] [Indexed: 12/15/2022] Open
Abstract
Background There are many different types of microRNAs (miRNAs) and elucidating their functions is still under intensive research. A fundamental step in functional annotation of a new miRNA is to classify it into characterized miRNA families, such as those in Rfam and miRBase. With the accumulation of annotated miRNAs, it becomes possible to use deep learning-based models to classify different types of miRNAs. In this work, we investigate several key issues associated with successful application of deep learning models for miRNA classification. First, as secondary structure conservation is a prominent feature for noncoding RNAs including miRNAs, we examine whether secondary structure-based encoding improves classification accuracy. Second, as there are many more non-miRNA sequences than miRNAs, instead of assigning a negative class for all non-miRNA sequences, we test whether using softmax output can distinguish in-distribution and out-of-distribution samples. Finally, we investigate whether deep learning models can correctly classify sequences from small miRNA families. Results We present our trained convolutional neural network (CNN) models for classifying miRNAs using different types of feature learning and encoding methods. In the first method, we explicitly encode the predicted secondary structure in a matrix. In the second method, we use only the primary sequence information and one-hot encoding matrix. In addition, in order to reject sequences that should not be classified into targeted miRNA families, we use a threshold derived from softmax layer to exclude out-of-distribution sequences, which is an important feature to make this model useful for real transcriptomic data. The comparison with the state-of-the-art ncRNA classification tools such as Infernal shows that our method can achieve comparable sensitivity and accuracy while being significantly faster. Conclusion Automatic feature learning in CNN can lead to better classification accuracy and sensitivity for miRNA classification and annotation. The trained models and also associated codes are freely available at https://github.com/HubertTang/DeepMir.
Collapse
Affiliation(s)
- Xubo Tang
- Department of Electronic Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Yanni Sun
- Department of Electronic Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.
| |
Collapse
|
25
|
Bugnon LA, Yones C, Raad J, Milone DH, Stegmayer G. Genome-wide hairpins datasets of animals and plants for novel miRNA prediction. Data Brief 2019; 25:104209. [PMID: 31453279 PMCID: PMC6700487 DOI: 10.1016/j.dib.2019.104209] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 06/16/2019] [Accepted: 06/25/2019] [Indexed: 01/19/2023] Open
Abstract
This article makes available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Each dataset provides the genome data divided into sequences and a set of computed features for predictions. Each sequence has one label: i) “positive”: meaning that it is a well-known pre-miRNA, according to miRBase v21; or ii) “unlabeled”: indicating that the sequence has not (yet) a known function and could be a possible candidate to novel pre-miRNA. Due to the fact that selecting an informative feature set is very important for a good pre-miRNA classifier, a representative feature set with large discriminative power has been calculated and it is provided, as well, for each genome. This feature set contains typical information about sequence, topology and structure. Dataset was publically shared in https://sourceforge.net/projects/sourcesinc/files/mirdata/.
Collapse
Affiliation(s)
- L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - C Yones
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|