1
|
Krautwurst S, Lamkiewicz K. RNA-protein interaction prediction without high-throughput data: An overview and benchmark of in silico tools. Comput Struct Biotechnol J 2024; 23:4036-4046. [PMID: 39610906 PMCID: PMC11603007 DOI: 10.1016/j.csbj.2024.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 11/05/2024] [Accepted: 11/05/2024] [Indexed: 11/30/2024] Open
Abstract
RNA-protein interactions (RPIs) are crucial for accurately operating various processes in and between organisms across kingdoms of life. Mutual detection of RPI partner molecules depends on distinct sequential, structural, or thermodynamic features, which can be determined via experimental and bioinformatic methods. Still, the underlying molecular mechanisms of many RPIs are poorly understood. It is further hypothesized that many RPIs are not even described yet. Computational RPI prediction is continuously challenged by the lack of data and detailed research of very specific examples. With the discovery of novel RPI complexes in all kingdoms of life, adaptations of existing RPI prediction methods are necessary. Continuously improving computational RPI prediction is key in advancing the understanding of RPIs in detail and supplementing experimental RPI determination. The growing amount of data covering more species and detailed mechanisms support the accuracy of prediction tools, which in turn support specific experimental research on RPIs. Here, we give an overview of RPI prediction tools that do not use high-throughput data as the user's input. We review the tools according to their input, usability, and output. We then apply the tools to known RPI examples across different kingdoms of life. Our comparison shows that the investigated prediction tools do not favor a certain species and equip the user with results varying in degree of information, from an overall RPI score to detailed interacting residues. Furthermore, we provide a guide tree to assist users which RPI prediction tool is appropriate for their available input data and desired output.
Collapse
Affiliation(s)
- Sarah Krautwurst
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Kevin Lamkiewicz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr. 4, 04103 Leipzig, Germany
| |
Collapse
|
2
|
Kekenes-Huskey PM, Burgess DE, Sun B, Bartos DC, Rozmus ER, Anderson CL, January CT, Eckhardt LL, Delisle BP. Mutation-Specific Differences in Kv7.1 ( KCNQ1) and Kv11.1 ( KCNH2) Channel Dysfunction and Long QT Syndrome Phenotypes. Int J Mol Sci 2022; 23:7389. [PMID: 35806392 PMCID: PMC9266926 DOI: 10.3390/ijms23137389] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 06/22/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
The electrocardiogram (ECG) empowered clinician scientists to measure the electrical activity of the heart noninvasively to identify arrhythmias and heart disease. Shortly after the standardization of the 12-lead ECG for the diagnosis of heart disease, several families with autosomal recessive (Jervell and Lange-Nielsen Syndrome) and dominant (Romano-Ward Syndrome) forms of long QT syndrome (LQTS) were identified. An abnormally long heart rate-corrected QT-interval was established as a biomarker for the risk of sudden cardiac death. Since then, the International LQTS Registry was established; a phenotypic scoring system to identify LQTS patients was developed; the major genes that associate with typical forms of LQTS were identified; and guidelines for the successful management of patients advanced. In this review, we discuss the molecular and cellular mechanisms for LQTS associated with missense variants in KCNQ1 (LQT1) and KCNH2 (LQT2). We move beyond the "benign" to a "pathogenic" binary classification scheme for different KCNQ1 and KCNH2 missense variants and discuss gene- and mutation-specific differences in K+ channel dysfunction, which can predispose people to distinct clinical phenotypes (e.g., concealed, pleiotropic, severe, etc.). We conclude by discussing the emerging computational structural modeling strategies that will distinguish between dysfunctional subtypes of KCNQ1 and KCNH2 variants, with the goal of realizing a layered precision medicine approach focused on individuals.
Collapse
Affiliation(s)
- Peter M. Kekenes-Huskey
- Department of Cell and Molecular Physiology, Stritch School of Medicine, Loyola University Chicago, Maywood, IL 60153, USA
| | - Don E. Burgess
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| | - Bin Sun
- Department of Pharmacology, Harbin Medical University, Harbin 150081, China;
| | | | - Ezekiel R. Rozmus
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| | - Corey L. Anderson
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Craig T. January
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Lee L. Eckhardt
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Brian P. Delisle
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| |
Collapse
|
3
|
Arora V, Sanguinetti G. Challenges for machine learning in RNA-protein interaction prediction. Stat Appl Genet Mol Biol 2022; 21:sagmb-2021-0087. [PMID: 35073469 DOI: 10.1515/sagmb-2021-0087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/02/2022] [Indexed: 11/15/2022]
Abstract
RNA-protein interactions have long being recognised as crucial regulators of gene expression. Recently, the development of scalable experimental techniques to measure these interactions has revolutionised the field, leading to the production of large-scale datasets which offer both opportunities and challenges for machine learning techniques. In this brief note, we will discuss some of the major stumbling blocks towards the use of machine learning in computational RNA biology, focusing specifically on the problem of predicting RNA-protein interactions from next-generation sequencing data.
Collapse
Affiliation(s)
- Viplove Arora
- Data Science, Department of Physics, International School for Advanced Studies (SISSA), Trieste 34136, Italy
| | - Guido Sanguinetti
- Data Science, Department of Physics, International School for Advanced Studies (SISSA), Trieste 34136, Italy
| |
Collapse
|
4
|
Wei J, Chen S, Zong L, Gao X, Li Y. Protein-RNA interaction prediction with deep learning: structure matters. Brief Bioinform 2022; 23:bbab540. [PMID: 34929730 PMCID: PMC8790951 DOI: 10.1093/bib/bbab540] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/14/2021] [Accepted: 11/22/2021] [Indexed: 12/11/2022] Open
Abstract
Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Because of the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utilizing the structural information. Recently, AlphaFold has revolutionized the entire protein and biology field. Foreseeably, the protein-RNA interaction prediction will also be promoted significantly in the upcoming years. In this work, we give a thorough review of this field, surveying both the binding site and binding preference prediction problems and covering the commonly used datasets, features and models. We also point out the potential challenges and opportunities in this field. This survey summarizes the development of the RNA-binding protein-RNA interaction field in the past and foresees its future development in the post-AlphaFold era.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Siyuan Chen
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
- The CUHK Shenzhen Research Institute, Hi-Tech Park, 518057,
Shenzhen, China
| |
Collapse
|
5
|
Guo R, Teng Z, Wang Y, Zhou X, Xu H, Liu D. Integrated Learning: Screening Optimal Biomarkers for Identifying Preeclampsia in Placental mRNA Samples. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6691096. [PMID: 33680070 PMCID: PMC7925050 DOI: 10.1155/2021/6691096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/17/2021] [Accepted: 01/27/2021] [Indexed: 01/28/2023]
Abstract
Preeclampsia (PE) is a maternal disease that causes maternal and child death. Treatment and preventive measures are not sound enough. The problem of PE screening has attracted much attention. The purpose of this study is to screen placental mRNA to obtain the best PE biomarkers for identifying patients with PE. We use Limma in the R language to screen out the 48 differentially expressed genes with the largest differences and used correlation-based feature selection algorithms to reduce the dimensionality and avoid attribute redundancy arising from too many mRNA samples participating in the classification. After reducing the mRNA attributes, the mRNA samples are sorted from large to small according to information gain. In this study, a classifier model is designed to identify whether samples had PE through mRNA in the placenta. To improve the accuracy of classification and avoid overfitting, three classifiers, including C4.5, AdaBoost, and multilayer perceptron, are used. We use the majority voting strategy integrated with the differentially expressed genes and the genes filtered by the best subset method as comparison methods to train the classifier. The results show that the classification accuracy rate has increased from 79% to 82.2%, and the number of mRNA features has decreased from 48 to 13. This study provides clues for the main PE biomarkers of mRNA in the placenta and provides ideas for the treatment and screening of PE.
Collapse
Affiliation(s)
- Rong Guo
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| | - Zhixia Teng
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| | - Yiding Wang
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| | - Xin Zhou
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| | - Heze Xu
- Department of Gynecology and Obstetrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Dan Liu
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| |
Collapse
|
6
|
Lucero L, Ferrero L, Fonouni-Farde C, Ariel F. Functional classification of plant long noncoding RNAs: a transcript is known by the company it keeps. THE NEW PHYTOLOGIST 2021; 229:1251-1260. [PMID: 32880949 DOI: 10.1111/nph.16903] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 08/05/2020] [Indexed: 05/27/2023]
Abstract
The extraordinary maturation in high-throughput sequencing technologies has revealed the existence of a complex network of transcripts in eukaryotic organisms, including thousands of long noncoding (lnc) RNAs with little or no protein-coding capacity. Subsequent discoveries have shown that lncRNAs participate in a wide range of molecular processes, controlling gene expression and protein activity though direct interactions with proteins, DNA or other RNA molecules. Although significant advances have been achieved in the understanding of lncRNA biology in the animal kingdom, the functional characterization of plant lncRNAs is still in its infancy and remains a major challenge. In this review, we report emerging functional and mechanistic paradigms of plant lncRNAs and partner molecules, and discuss how cutting-edge technologies may help to identify and classify yet uncharacterized transcripts into functional groups.
Collapse
Affiliation(s)
- Leandro Lucero
- Instituto de Agrobiotecnología del Litoral, CONICET, Universidad Nacional del Litoral, Colectora Ruta Nacional 168 km 0, Santa Fe, 3000, Argentina
| | - Lucía Ferrero
- Instituto de Agrobiotecnología del Litoral, CONICET, Universidad Nacional del Litoral, Colectora Ruta Nacional 168 km 0, Santa Fe, 3000, Argentina
| | - Camille Fonouni-Farde
- Instituto de Agrobiotecnología del Litoral, CONICET, Universidad Nacional del Litoral, Colectora Ruta Nacional 168 km 0, Santa Fe, 3000, Argentina
| | - Federico Ariel
- Instituto de Agrobiotecnología del Litoral, CONICET, Universidad Nacional del Litoral, Colectora Ruta Nacional 168 km 0, Santa Fe, 3000, Argentina
| |
Collapse
|