1
|
Zhang J, Qin Y, Tian R, Bai X, Liu J. Similarity measure method of near-infrared spectrum combined with multi-attribute information. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 322:124783. [PMID: 38972098 DOI: 10.1016/j.saa.2024.124783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 07/01/2024] [Accepted: 07/03/2024] [Indexed: 07/09/2024]
Abstract
Due to the high-dimensionality, redundancy, and non-linearity of the near-infrared (NIR) spectra data, as well as the influence of attributes such as producing area and grade of the sample, which can all affect the similarity measure between samples. This paper proposed a t-distributed stochastic neighbor embedding algorithm based on Sinkhorn distance (St-SNE) combined with multi-attribute data information. Firstly, the Sinkhorn distance was introduced which can solve problems such as KL divergence asymmetry and sparse data distribution in high-dimensional space, thereby constructing probability distributions that make low-dimensional space similar to high-dimensional space. In addition, to address the impact of multi-attribute features of samples on similarity measure, a multi-attribute distance matrix was constructed using information entropy, and then combined with the numerical matrix of spectral data to obtain a mixed data matrix. In order to validate the effectiveness of the St-SNE algorithm, dimensionality reduction projection was performed on NIR spectral data and compared with PCA, LPP, and t-SNE algorithms. The results demonstrated that the St-SNE algorithm effectively distinguishes samples with different attribute information, and produced more distinct projection boundaries of sample category in low-dimensional space. Then we tested the classification performance of St-SNE for different attributes by using the tobacco and mango datasets, and compared it with LPP, t-SNE, UMAP, and Fisher t-SNE algorithms. The results showed that St-SNE algorithm had the highest classification accuracy for different attributes. Finally, we compared the results of searching the most similar sample with the target tobacco for cigarette formulas, and experiments showed that the St-SNE had the highest consistency with the recommendation of the experts than that of the other algorithms. It can provide strong support for the maintenance and design of the product formula.
Collapse
Affiliation(s)
- Jinfeng Zhang
- College of Information Science and Technology, Qingdao University of Science and Technology, China
| | - Yuhua Qin
- College of Information Science and Technology, Qingdao University of Science and Technology, China.
| | - Rongkun Tian
- College of Information Science and Technology, Qingdao University of Science and Technology, China
| | - Xiaoli Bai
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China
| | - Jing Liu
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China
| |
Collapse
|
2
|
Iyer SS, Srivastava A. Membrane lateral organization from potential energy disconnectivity graph. Biophys Chem 2024; 313:107284. [PMID: 39002248 DOI: 10.1016/j.bpc.2024.107284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/15/2024]
Abstract
Understanding the thermodynamic and kinetic properties of biomolecules requires elucidation of their complex energy landscape. A disconnectivity graph analysis of the energy landscape provides a framework for mapping the multi-dimensional landscape onto a two-dimensional representation while preserving the key features of the energy landscape. Several studies show that the structure or shape of the disconnectity graph is directly associated with the function of protein and nucleic acid molecules. In this review, we discuss how disconnectivity analysis of the potential energy surface can be extended to lipid molecules to glean important information about membrane organization. The shape of the disconnectivity graphs can be used to predict the lateral organization of multi-component lipid bilayer. We hope that this review encourages the use of disconnectivity graphs routinely by membrane biophysicists to predict the lateral organization of lipids.
Collapse
Affiliation(s)
| | - Anand Srivastava
- Molecular Biophysics Unit, Indian Institute of Science Bangalore, C. V. Raman Road, Bangalore, Karnataka 560012, India.
| |
Collapse
|
3
|
Bakker MJ, Gaffour A, Juhás M, Zapletal V, Stošek J, Bratholm LA, Pavlíková Přecechtělová J. Streamlining NMR Chemical Shift Predictions for Intrinsically Disordered Proteins: Design of Ensembles with Dimensionality Reduction and Clustering. J Chem Inf Model 2024; 64:6542-6556. [PMID: 39099394 DOI: 10.1021/acs.jcim.4c00809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2024]
Abstract
By merging advanced dimensionality reduction (DR) and clustering algorithm (CA) techniques, our study advances the sampling procedure for predicting NMR chemical shifts (CS) in intrinsically disordered proteins (IDPs), making a significant leap forward in the field of protein analysis/modeling. We enhance NMR CS sampling by generating clustered ensembles that accurately reflect the different properties and phenomena encapsulated by the IDP trajectories. This investigation critically assessed different rapid CS predictors, both neural network (e.g., Sparta+ and ShiftX2) and database-driven (ProCS-15), and highlighted the need for more advanced quantum calculations and the subsequent need for more tractable-sized conformational ensembles. Although neural network CS predictors outperformed ProCS-15 for all atoms, all tools showed poor agreement with HN CSs, and the neural network CS predictors were unable to capture the influence of phosphorylated residues, highly relevant for IDPs. This study also addressed the limitations of using direct clustering with collective variables, such as the widespread implementation of the GROMOS algorithm. Clustered ensembles (CEs) produced by this algorithm showed poor performance with chemical shifts compared to sequential ensembles (SEs) of similar size. Instead, we implement a multiscale DR and CA approach and explore the challenges and limitations of applying these algorithms to obtain more robust and tractable CEs. The novel feature of this investigation is the use of solvent-accessible surface area (SASA) as one of the fingerprints for DR alongside previously investigated α carbon distance/angles or ϕ/ψ dihedral angles. The ensembles produced with SASA tSNE DR produced CEs better aligned with the experimental CS of between 0.17 and 0.36 r2 (0.18-0.26 ppm) depending on the system and replicate. Furthermore, this technique produced CEs with better agreement than traditional SEs in 85.7% of all ensemble sizes. This study investigates the quality of ensembles produced based on different input features, comparing latent spaces produced by linear vs nonlinear DR techniques and a novel integrated silhouette score scanning protocol for tSNE DR.
Collapse
Affiliation(s)
- Michael J Bakker
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Amina Gaffour
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Martin Juhás
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
- Department of Chemistry, Faculty of Science, University of Hradec Králové, Rokitanského 62, 500 03 Hradec Králové, Czech Republic
| | - Vojtěch Zapletal
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Jakub Stošek
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
- Department of Chemistry, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Lars A Bratholm
- School of Chemistry, University of Bristol, Cantock's Close, BS8 1TS Bristol, U.K
| | - Jana Pavlíková Přecechtělová
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| |
Collapse
|
4
|
Gagliani F, Di Giulio T, Asif MI, Malitesta C, Mazzotta E. Boosting Electrochemical Sensing Performances Using Molecularly Imprinted Nanoparticles. BIOSENSORS 2024; 14:358. [PMID: 39056634 PMCID: PMC11274585 DOI: 10.3390/bios14070358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 07/18/2024] [Accepted: 07/20/2024] [Indexed: 07/28/2024]
Abstract
Nanoparticles of molecularly imprinted polymers (nanoMIPs) combine the excellent recognition ability of imprinted polymers with specific properties related to the nanosize, such as a high surface-to-volume ratio, resulting in highly performing recognition elements with surface-exposed binding sites that promote the interaction with the target and, in turn, binding kinetics. Different synthetic strategies are currently available to produce nanoMIPs, with the possibility to select specific conditions in relation to the nature of monomers/templates and, importantly, to tune the nanoparticle size. The excellent sensing properties, combined with the size, tunability, and flexibility of synthetic protocols applicable to different targets, have enabled the widespread use of nanoMIPs in several applications, including sensors, imaging, and drug delivery. The present review summarizes nanoMIPs applications in sensors, specifically focusing on electrochemical detection, for which nanoMIPs have been mostly applied. After a general survey of the most widely adopted nanoMIP synthetic approaches, the integration of imprinted nanoparticles with electrochemical transducers will be discussed, representing a key step for enabling a reliable and stable sensor response. The mechanisms for electrochemical signal generation will also be compared, followed by an illustration of nanoMIP-based electrochemical sensor employment in several application fields. The high potentialities of nanoMIP-based electrochemical sensors are presented, and possible reasons that still limit their commercialization and issues to be resolved for coupling electrochemical sensing and nanoMIPs in an increasingly widespread daily-use technology are discussed.
Collapse
Affiliation(s)
| | | | | | | | - Elisabetta Mazzotta
- Laboratorio di Chimica Analitica, Dipartimento di Scienze e Tecnologie Biologiche e Ambientali (Di.S.Te.B.A.), Università del Salento, Via Monteroni, 73100 Lecce, Italy; (F.G.); (T.D.G.); (M.I.A.); (C.M.)
| |
Collapse
|
5
|
Faran M, Ray D, Nag S, Raucci U, Parrinello M, Bisker G. A Stochastic Landscape Approach for Protein Folding State Classification. J Chem Theory Comput 2024; 20:5428-5438. [PMID: 38924770 PMCID: PMC11238538 DOI: 10.1021/acs.jctc.4c00464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 06/12/2024] [Accepted: 06/12/2024] [Indexed: 06/28/2024]
Abstract
Protein folding is a critical process that determines the functional state of proteins. Proper folding is essential for proteins to acquire their functional three-dimensional structures and execute their biological role, whereas misfolded proteins can lead to various diseases, including neurodegenerative disorders like Alzheimer's and Parkinson's. Therefore, a deeper understanding of protein folding is vital for understanding disease mechanisms and developing therapeutic strategies. This study introduces the Stochastic Landscape Classification (SLC), an innovative, automated, nonlearning algorithm that quantitatively analyzes protein folding dynamics. Focusing on collective variables (CVs) - low-dimensional representations of complex dynamical systems like molecular dynamics (MD) of macromolecules - the SLC approach segments the CVs into distinct macrostates, revealing the protein folding pathway explored by MD simulations. The segmentation is achieved by analyzing changes in CV trends and clustering these segments using a standard density-based spatial clustering of applications with noise (DBSCAN) scheme. Applied to the MD-based CV trajectories of Chignolin and Trp-Cage proteins, the SLC demonstrates apposite accuracy, validated by comparing standard classification metrics against ground-truth data. These metrics affirm the efficacy of the SLC in capturing intricate protein dynamics and offer a method to evaluate and select the most informative CVs. The practical application of this technique lies in its ability to provide a detailed, quantitative description of protein folding processes, with significant implications for understanding and manipulating protein behavior in industrial and pharmaceutical contexts.
Collapse
Affiliation(s)
- Michael Faran
- Department
of Biomedical Engineering, Faculty of Engineering, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dhiman Ray
- Atomistic
Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16152 Genova, Italy
| | - Shubhadeep Nag
- Department
of Biomedical Engineering, Faculty of Engineering, Tel Aviv University, Tel Aviv 69978, Israel
| | - Umberto Raucci
- Atomistic
Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16152 Genova, Italy
| | - Michele Parrinello
- Atomistic
Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16152 Genova, Italy
| | - Gili Bisker
- Department
of Biomedical Engineering, Faculty of Engineering, Tel Aviv University, Tel Aviv 69978, Israel
- The
Center for Physics and Chemistry of Living Systems, Tel Aviv University, Tel Aviv 6997801, Israel
- The
Center for Nanoscience and Nanotechnology, Tel Aviv University, Tel Aviv 6997801, Israel
- The
Center for Light-Matter Interaction, Tel
Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
6
|
Bosio S, Bernetti M, Rocchia W, Masetti M. Similarities and Differences in Ligand Binding to Protein and RNA Targets: The Case of Riboflavin. J Chem Inf Model 2024; 64:4570-4586. [PMID: 38800845 DOI: 10.1021/acs.jcim.4c00420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
It is nowadays clear that RNA molecules can play active roles in several biological processes. As a result, an increasing number of RNAs are gradually being identified as potentially druggable targets. In particular, noncoding RNAs can adopt highly organized conformations that are suitable for drug binding. However, RNAs are still considered challenging targets due to their complex structural dynamics and high charge density. Thus, elucidating relevant features of drug-RNA binding is fundamental for advancing drug discovery. Here, by using Molecular Dynamics simulations, we compare key features of ligand binding to proteins with those observed in RNA. Specifically, we explore similarities and differences in terms of (i) conformational flexibility of the target, (ii) electrostatic contribution to binding free energy, and (iii) water and ligand dynamics. As a test case, we examine binding of the same ligand, namely riboflavin, to protein and RNA targets, specifically the riboflavin (RF) kinase and flavin mononucleotide (FMN) riboswitch. The FMN riboswitch exhibited enhanced fluctuations and explored a wider conformational space, compared to the protein target, underscoring the importance of RNA flexibility in ligand binding. Conversely, a similar electrostatic contribution to the binding free energy of riboflavin was found. Finally, greater stability of water molecules was observed in the FMN riboswitch compared to the RF kinase, possibly due to the different shape and polarity of the pockets.
Collapse
Affiliation(s)
- Stefano Bosio
- Department of Pharmacy and Biotechnology, Alma Mater Studiorum - University of Bologna, Via Belmeloro 6, 40126 Bologna, Italy
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, I-16163 Genova, Italy
| | - Mattia Bernetti
- Department of Pharmacy and Biotechnology, Alma Mater Studiorum - University of Bologna, Via Belmeloro 6, 40126 Bologna, Italy
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, I-16163 Genova, Italy
| | - Walter Rocchia
- Computational mOdelling of NanosCalE and bioPhysical sysTems (CONCEPT) Lab, Istituto Italiano di Tecnologia, Via Melen - 83, B Block, 16152 Genova, Italy
| | - Matteo Masetti
- Department of Pharmacy and Biotechnology, Alma Mater Studiorum - University of Bologna, Via Belmeloro 6, 40126 Bologna, Italy
| |
Collapse
|
7
|
Taneja I, Lasker K. Machine-learning-based methods to generate conformational ensembles of disordered proteins. Biophys J 2024; 123:101-113. [PMID: 38053335 PMCID: PMC10808026 DOI: 10.1016/j.bpj.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/24/2023] [Accepted: 12/01/2023] [Indexed: 12/07/2023] Open
Abstract
Intrinsically disordered proteins are characterized by a conformational ensemble. While computational approaches such as molecular dynamics simulations have been used to generate such ensembles, their computational costs can be prohibitive. An alternative approach is to learn from data and train machine-learning models to generate conformational ensembles of disordered proteins. This has been a relatively unexplored approach, and in this work we demonstrate a proof-of-principle approach to do so. Specifically, we devised a two-stage computational pipeline: in the first stage, we employed supervised machine-learning models to predict ensemble-derived two-dimensional (2D) properties of a sequence, given the conformational ensemble of a closely related sequence. In the second stage, we used denoising diffusion models to generate three-dimensional (3D) coarse-grained conformational ensembles, given the two-dimensional predictions outputted by the first stage. We trained our models on a data set of coarse-grained molecular dynamics simulations of thousands of rationally designed synthetic sequences. The accuracy of our 2D and 3D predictions was validated across multiple metrics, and our work demonstrates the applicability of machine-learning techniques to predicting higher-dimensional properties of disordered proteins.
Collapse
Affiliation(s)
- Ishan Taneja
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California
| | - Keren Lasker
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California.
| |
Collapse
|
8
|
Balasubramanian S, Maharana S, Srivastava A. "Boundary residues" between the folded RNA recognition motif and disordered RGG domains are critical for FUS-RNA binding. J Biol Chem 2023; 299:105392. [PMID: 37890778 PMCID: PMC10687056 DOI: 10.1016/j.jbc.2023.105392] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 09/19/2023] [Accepted: 10/19/2023] [Indexed: 10/29/2023] Open
Abstract
Fused in sarcoma (FUS) is an abundant RNA-binding protein, which drives phase separation of cellular condensates and plays multiple roles in RNA regulation. The RNA-binding ability of FUS protein is crucial to its cellular function. Here, our molecular simulation study on the FUS-RNA complex provides atomic resolution insights into the observations from biochemical studies and also illuminates our understanding of molecular driving forces that mediate the structure, stability, and interaction of the RNA recognition motif (RRM) and RGG domains of FUS with a stem-loop junction RNA. We observe clear cooperativity and division of labor among the ordered (RRM) and disordered domains (RGG1 and RGG2) of FUS that leads to an organized and tighter RNA binding. Irrespective of the length of RGG2, the RGG2-RNA interaction is confined to the stem-loop junction and the proximal stem regions. On the other hand, the RGG1 interactions are primarily with the longer RNA stem. We find that the C terminus of RRM, which make up the "boundary residues" that connect the folded RRM with the long disordered RGG2 stretch of the protein, plays a critical role in FUS-RNA binding. Our study provides high-resolution molecular insights into the FUS-RNA interactions and forms the basis for understanding the molecular origins of full-length FUS interaction with RNA.
Collapse
Affiliation(s)
| | - Shovamayee Maharana
- Department of Molecular and Cell Biology, Indian Institute of Science Bangalore, Bangalore, Karnataka, India
| | - Anand Srivastava
- Molecular Biophysics Unit, Indian Institute of Science Bangalore, Bangalore, Karnataka, India.
| |
Collapse
|