1
|
Chang F, Liu L, Hu F, Sun X, Zhao Y, Zhang N, Li C. RNAfcg: RNA Flexibility Prediction Based on Topological Centrality and Global Features. J Chem Inf Model 2024; 64:7786-7792. [PMID: 39276067 DOI: 10.1021/acs.jcim.4c00848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2024]
Abstract
The dynamics of RNAs are related intimately to their functions. Molecular flexibility, as a starting point for understanding their dynamics, has been utilized to predict many characteristics associated with their functions. Since the experimental measurement methods are time-consuming and labor-intensive, it is urgently needed to develop reliable theoretical methods to predict RNA flexibility. In this work, we develop an effective machine learning method, RNAfcg, to predict RNA flexibility, where the Random Forest (RF) is trained by features including the topological centralities, flexibility-rigidity index, and global characteristics first introduced by us, as well as some traditional sequence and structural features. The analyses show that the three types of features introduced first have significant contributions to RNA flexibility prediction, among which the topological type contributes the most, which indicates the importance of structural topology in determining RNA flexibility. The performance comparison indicates that RNAfcg outperforms the state-of-the-art machine learning methods and the commonly used Gaussian Network Model (GNM) models, achieving a much higher Pearson correlation coefficient (PCC) of 0.6619 on the test data set. This work is helpful for understanding RNA dynamics and can be used to predict RNA function information. The source code is available at https://github.com/ChunhuaLab/RNAfcg/.
Collapse
Affiliation(s)
- Fubin Chang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Lamei Liu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Yingchun Zhao
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Na Zhang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
2
|
Lin X, Gao Y, Lei F. An application of topological data analysis in predicting sumoylation sites. PeerJ 2023; 11:e16204. [PMID: 37846308 PMCID: PMC10576966 DOI: 10.7717/peerj.16204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Accepted: 09/08/2023] [Indexed: 10/18/2023] Open
Abstract
Sumoylation is a reversible post-translational modification that regulates certain significant biochemical functions in proteins. The protein alterations caused by sumoylation are associated with the incidence of some human diseases. Therefore, identifying the sites of sumoylation in proteins may provide a direction for mechanistic research and drug development. Here, we propose a new computational approach for identifying sumoylation sites using an encoding method based on topological data analysis. The features of our model captured the key physical and biological properties of proteins at multiple scales. In a 10-fold cross validation, the outcomes of our model showed 96.45% of sensitivity (Sn), 94.65% of accuracy (Acc), 0.8946 of Matthew's correlation coefficient (MCC), and 0.99 of area under curve (AUC). The proposed predictor with only topological features achieves the best MCC and AUC in comparison to the other released methods. Our results suggest that topological information is an additional parameter that can assist in the prediction of sumoylation sites and provide a novel perspective for further research in protein sumoylation.
Collapse
Affiliation(s)
- Xiaoxi Lin
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Yaru Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Fengchun Lei
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| |
Collapse
|
3
|
Xia K, Liu X, Wee J. Persistent Homology for RNA Data Analysis. Methods Mol Biol 2023; 2627:211-229. [PMID: 36959450 DOI: 10.1007/978-1-0716-2974-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Molecular representations are of great importance for machine learning models in RNA data analysis. Essentially, efficient molecular descriptors or fingerprints that characterize the intrinsic structural and interactional information of RNAs can significantly boost the performance of all learning modeling. In this paper, we introduce two persistent models, including persistent homology and persistent spectral, for RNA structure and interaction representations and their applications in RNA data analysis. Different from traditional geometric and graph representations, persistent homology is built on simplicial complex, which is a generalization of graph models to higher-dimensional situations. Hypergraph is a further generalization of simplicial complexes and hypergraph-based embedded persistent homology has been proposed recently. Moreover, persistent spectral models, which combine filtration process with spectral models, including spectral graph, spectral simplicial complex, and spectral hypergraph, are proposed for molecular representation. The persistent attributes for RNAs can be obtained from these two persistent models and further combined with machine learning models for RNA structure, flexibility, dynamics, and function analysis.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore.
| | - Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
4
|
Skaf Y, Laubenbacher R. Topological data analysis in biomedicine: A review. J Biomed Inform 2022; 130:104082. [PMID: 35508272 DOI: 10.1016/j.jbi.2022.104082] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/20/2022] [Accepted: 04/23/2022] [Indexed: 01/22/2023]
Abstract
Significant technological advances made in recent years have shepherded a dramatic increase in utilization of digital technologies for biomedicine- everything from the widespread use of electronic health records to improved medical imaging capabilities and the rising ubiquity of genomic sequencing contribute to a "digitization" of biomedical research and clinical care. With this shift toward computerized tools comes a dramatic increase in the amount of available data, and current tools for data analysis capable of extracting meaningful knowledge from this wealth of information have yet to catch up. This article seeks to provide an overview of emerging mathematical methods with the potential to improve the abilities of clinicians and researchers to analyze biomedical data, but may be hindered from doing so by a lack of conceptual accessibility and awareness in the life sciences research community. In particular, we focus on topological data analysis (TDA), a set of methods grounded in the mathematical field of algebraic topology that seeks to describe and harness features related to the "shape" of data. We aim to make such techniques more approachable to non-mathematicians by providing a conceptual discussion of their theoretical foundations followed by a survey of their published applications to scientific research. Finally, we discuss the limitations of these methods and suggest potential avenues for future work integrating mathematical tools into clinical care and biomedical informatics.
Collapse
Affiliation(s)
- Yara Skaf
- University of Florida, Department of Mathematics, Gainesville, FL, USA; University of Florida, Department of Medicine, Division of Pulmonary, Critical Care, & Sleep Medicine, Gainesville, FL, USA.
| | - Reinhard Laubenbacher
- University of Florida, Department of Mathematics, Gainesville, FL, USA; University of Florida, Department of Medicine, Division of Pulmonary, Critical Care, & Sleep Medicine, Gainesville, FL, USA.
| |
Collapse
|
5
|
Falsetti L, Rucco M, Proietti M, Viticchi G, Zaccone V, Scarponi M, Giovenali L, Moroncini G, Nitti C, Salvi A. Risk prediction of clinical adverse outcomes with machine learning in a cohort of critically ill patients with atrial fibrillation. Sci Rep 2021; 11:18925. [PMID: 34556682 PMCID: PMC8460701 DOI: 10.1038/s41598-021-97218-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 08/23/2021] [Indexed: 11/26/2022] Open
Abstract
Critically ill patients affected by atrial fibrillation are at high risk of adverse events: however, the actual risk stratification models for haemorrhagic and thrombotic events are not validated in a critical care setting. With this paper we aimed to identify, adopting topological data analysis, the risk factors for therapeutic failure (in-hospital death or intensive care unit transfer), the in-hospital occurrence of stroke/TIA and major bleeding in a cohort of critically ill patients with pre-existing atrial fibrillation admitted to a stepdown unit; to engineer newer prediction models based on machine learning in the same cohort. We selected all medical patients admitted for critical illness and a history of pre-existing atrial fibrillation in the timeframe 01/01/2002–03/08/2007. All data regarding patients’ medical history, comorbidities, drugs adopted, vital parameters and outcomes (therapeutic failure, stroke/TIA and major bleeding) were acquired from electronic medical records. Risk factors for each outcome were analyzed adopting topological data analysis. Machine learning was used to generate three different predictive models. We were able to identify specific risk factors and to engineer dedicated clinical prediction models for therapeutic failure (AUC: 0.974, 95%CI: 0.934–0.975), stroke/TIA (AUC: 0.931, 95%CI: 0.896–0.940; Brier score: 0.13) and major bleeding (AUC: 0.930:0.911–0.939; Brier score: 0.09) in critically-ill patients, which were able to predict accurately their respective clinical outcomes. Topological data analysis and machine learning techniques represent a concrete viewpoint for the physician to predict the risk at the patients’ level, aiding the selection of the best therapeutic strategy in critically ill patients affected by pre-existing atrial fibrillation.
Collapse
Affiliation(s)
- Lorenzo Falsetti
- Internal and Sub-Intensive Medicine Department, A.O.U. "Ospedali Riuniti" di Ancona, Via Conca 10, 60126, Ancona, Italy.
| | - Matteo Rucco
- Cyber-Physical Department, United Technology Research Center, Trento, Italy
| | - Marco Proietti
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy.,Geriatric Unit, IRCCS Istituti Clinici Scientifici Maugeri, Milan, Italy.,Liverpool Centre for Cardiovascular Science, University of Liverpool and Liverpool Heart and Chest Hospital, Liverpool, UK
| | - Giovanna Viticchi
- Neurological Clinic Department, A.O.U. "Ospedali Riuniti", Ancona, Italy
| | - Vincenzo Zaccone
- Internal and Sub-Intensive Medicine Department, A.O.U. "Ospedali Riuniti" di Ancona, Via Conca 10, 60126, Ancona, Italy
| | - Mattia Scarponi
- Emergency Medicine Residency Program, Marche Polytechnic University, Ancona, Italy
| | - Laura Giovenali
- Emergency Medicine Residency Program, Marche Polytechnic University, Ancona, Italy
| | - Gianluca Moroncini
- Clinica Medica, Azienda Ospedaliero-Universitaria "Ospedali Riuniti", Ancona, Italy
| | - Cinzia Nitti
- Internal and Sub-Intensive Medicine Department, A.O.U. "Ospedali Riuniti" di Ancona, Via Conca 10, 60126, Ancona, Italy
| | - Aldo Salvi
- Internal and Sub-Intensive Medicine Department, A.O.U. "Ospedali Riuniti" di Ancona, Via Conca 10, 60126, Ancona, Italy
| |
Collapse
|