1
|
Planas-Iglesias J, Borko S, Swiatkowski J, Elias M, Havlasek M, Salamon O, Grakova E, Kunka A, Martinovic T, Damborsky J, Martinovic J, Bednar D. AggreProt: a web server for predicting and engineering aggregation prone regions in proteins. Nucleic Acids Res 2024; 52:W159-W169. [PMID: 38801076 PMCID: PMC11223854 DOI: 10.1093/nar/gkae420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/23/2024] [Accepted: 05/13/2024] [Indexed: 05/29/2024] Open
Abstract
Recombinant proteins play pivotal roles in numerous applications including industrial biocatalysts or therapeutics. Despite the recent progress in computational protein structure prediction, protein solubility and reduced aggregation propensity remain challenging attributes to design. Identification of aggregation-prone regions is essential for understanding misfolding diseases or designing efficient protein-based technologies, and as such has a great socio-economic impact. Here, we introduce AggreProt, a user-friendly webserver that automatically exploits an ensemble of deep neural networks to predict aggregation-prone regions (APRs) in protein sequences. Trained on experimentally evaluated hexapeptides, AggreProt compares to or outperforms state-of-the-art algorithms on two independent benchmark datasets. The server provides per-residue aggregation profiles along with information on solvent accessibility and transmembrane propensity within an intuitive interface with interactive sequence and structure viewers for comprehensive analysis. We demonstrate AggreProt efficacy in predicting differential aggregation behaviours in proteins on several use cases, which emphasize its potential for guiding protein engineering strategies towards decreased aggregation propensity and improved solubility. The webserver is freely available and accessible at https://loschmidt.chemi.muni.cz/aggreprot/.
Collapse
Affiliation(s)
- Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Simeon Borko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Swiatkowski
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic
| | - Matej Elias
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic
| | - Martin Havlasek
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Ondrej Salamon
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic
| | - Ekaterina Grakova
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic
| | - Antonín Kunka
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Tomas Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
2
|
Ghosh D, Biswas A, Radhakrishna M. Advanced computational approaches to understand protein aggregation. BIOPHYSICS REVIEWS 2024; 5:021302. [PMID: 38681860 PMCID: PMC11045254 DOI: 10.1063/5.0180691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/18/2024] [Indexed: 05/01/2024]
Abstract
Protein aggregation is a widespread phenomenon implicated in debilitating diseases like Alzheimer's, Parkinson's, and cataracts, presenting complex hurdles for the field of molecular biology. In this review, we explore the evolving realm of computational methods and bioinformatics tools that have revolutionized our comprehension of protein aggregation. Beginning with a discussion of the multifaceted challenges associated with understanding this process and emphasizing the critical need for precise predictive tools, we highlight how computational techniques have become indispensable for understanding protein aggregation. We focus on molecular simulations, notably molecular dynamics (MD) simulations, spanning from atomistic to coarse-grained levels, which have emerged as pivotal tools in unraveling the complex dynamics governing protein aggregation in diseases such as cataracts, Alzheimer's, and Parkinson's. MD simulations provide microscopic insights into protein interactions and the subtleties of aggregation pathways, with advanced techniques like replica exchange molecular dynamics, Metadynamics (MetaD), and umbrella sampling enhancing our understanding by probing intricate energy landscapes and transition states. We delve into specific applications of MD simulations, elucidating the chaperone mechanism underlying cataract formation using Markov state modeling and the intricate pathways and interactions driving the toxic aggregate formation in Alzheimer's and Parkinson's disease. Transitioning we highlight how computational techniques, including bioinformatics, sequence analysis, structural data, machine learning algorithms, and artificial intelligence have become indispensable for predicting protein aggregation propensity and locating aggregation-prone regions within protein sequences. Throughout our exploration, we underscore the symbiotic relationship between computational approaches and empirical data, which has paved the way for potential therapeutic strategies against protein aggregation-related diseases. In conclusion, this review offers a comprehensive overview of advanced computational methodologies and bioinformatics tools that have catalyzed breakthroughs in unraveling the molecular basis of protein aggregation, with significant implications for clinical interventions, standing at the intersection of computational biology and experimental research.
Collapse
Affiliation(s)
- Deepshikha Ghosh
- Department of Biological Sciences and Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | - Anushka Biswas
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | | |
Collapse
|
3
|
Louros N, Rousseau F, Schymkowitz J. CORDAX web server: an online platform for the prediction and 3D visualization of aggregation motifs in protein sequences. Bioinformatics 2024; 40:btae279. [PMID: 38662570 PMCID: PMC11078773 DOI: 10.1093/bioinformatics/btae279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/09/2024] [Accepted: 04/24/2024] [Indexed: 05/12/2024] Open
Abstract
MOTIVATION Proteins, the molecular workhorses of biological systems, execute a multitude of critical functions dictated by their precise three-dimensional structures. In a complex and dynamic cellular environment, proteins can undergo misfolding, leading to the formation of aggregates that take up various forms, including amorphous and ordered aggregation in the shape of amyloid fibrils. This phenomenon is closely linked to a spectrum of widespread debilitating pathologies, such as Alzheimer's disease, Parkinson's disease, type-II diabetes, and several other proteinopathies, but also hampers the engineering of soluble agents, as in the case of antibody development. As such, the accurate prediction of aggregation propensity within protein sequences has become pivotal due to profound implications in understanding disease mechanisms, as well as in improving biotechnological and therapeutic applications. RESULTS We previously developed Cordax, a structure-based predictor that utilizes logistic regression to detect aggregation motifs in protein sequences based on their structural complementarity to the amyloid cross-beta architecture. Here, we present a dedicated web server interface for Cordax. This online platform combines several features including detailed scoring of sequence aggregation propensity, as well as 3D visualization with several customization options for topology models of the structural cores formed by predicted aggregation motifs. In addition, information is provided on experimentally determined aggregation-prone regions that exhibit sequence similarity to predicted motifs, scores, and links to other predictor outputs, as well as simultaneous predictions of relevant sequence propensities, such as solubility, hydrophobicity, and secondary structure propensity. AVAILABILITY AND IMPLEMENTATION The Cordax webserver is freely accessible at https://cordax.switchlab.org/.
Collapse
Affiliation(s)
- Nikolaos Louros
- Switch Laboratory, VIB Center for Brain and Disease Research, VIB, 3000 Leuven, Belgium
- Department of Cellular and Molecular Medicine, Switch Laboratory, KU Leuven, 3000 Leuven, Belgium
- Switch Laboratory, VIB Center for AI & Computational Biology, VIB, 3000 Leuven, Belgium
| | - Frederic Rousseau
- Switch Laboratory, VIB Center for Brain and Disease Research, VIB, 3000 Leuven, Belgium
- Department of Cellular and Molecular Medicine, Switch Laboratory, KU Leuven, 3000 Leuven, Belgium
- Switch Laboratory, VIB Center for AI & Computational Biology, VIB, 3000 Leuven, Belgium
| | - Joost Schymkowitz
- Switch Laboratory, VIB Center for Brain and Disease Research, VIB, 3000 Leuven, Belgium
- Department of Cellular and Molecular Medicine, Switch Laboratory, KU Leuven, 3000 Leuven, Belgium
- Switch Laboratory, VIB Center for AI & Computational Biology, VIB, 3000 Leuven, Belgium
| |
Collapse
|
4
|
Khalili K, Farzam F, Dabirmanesh B, Khajeh K. Prediction of protein aggregation. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 206:229-263. [PMID: 38811082 DOI: 10.1016/bs.pmbts.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
The scientific community is very interested in protein aggregation because of its involvement in several neurodegenerative diseases and its significance in industry. Remarkably, fibrillar aggregates are utilized naturally for constructing structural scaffolds or creating biological switches and may be intentionally designed to construct versatile nanomaterials. Consequently, there is a significant need to rationalize and predict protein aggregation. Researchers have developed various computational methodologies and algorithms to predict protein aggregation and understand its underlying mechanics. This chapter aims to summarize the significant advancements in computational methods, accessible resources, and prospective developments in the field of in silico research. We assess the existing computational tools for predicting protein aggregation propensities, detecting areas that are prone to sequential and structural aggregation, analyzing the effects of mutations on protein aggregation, or identifying prion-like domains.
Collapse
Affiliation(s)
- Kavyan Khalili
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Farnoosh Farzam
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Bahareh Dabirmanesh
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Khosro Khajeh
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
5
|
Yang Z, Wu Y, Liu H, He L, Deng X. AMYGNN: A Graph Convolutional Neural Network-Based Approach for Predicting Amyloid Formation from Polypeptides. J Chem Inf Model 2024; 64:1751-1762. [PMID: 38408296 DOI: 10.1021/acs.jcim.3c02035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
There has been an increasing interest in the use of amyloids for constructing various functional materials. The design of amyloid-associated functional materials requires the identification of the core peptide sequences as the fundamental building block. The existing computational methods are limited in terms of delineating polypeptides, the typical non-Euclidean structural data, and they fail to capture the dynamic interactions between amino acids due to ignoring the contextual information from surrounding amino acids. Here, we first propose the use of a state-of-the-art graph convolutional neural network for predicting the trends of amyloid formation from specific peptide sequences (AMYGNN) by abstracting each polypeptide as a graph, in which the constituting amino acids are viewed as nodes and edges characterizing the connections between pairs of amino acids are established when they meet a given distance threshold (Cα-Cα ≤ 5 Å). Our model achieves high performance with accuracy (0.9208), G-mean (0.9203), MCC (0.8417), and F1 (0.9235) in determining the characteristic peptide sequences to form amyloid. 32 of 534 crucial amino acid properties that greatly contribute to the formation of amyloids are ascertained, and the β-folding-like graph structure of a polypeptide is believed to be essential for the formation of amyloid. Our model enables the mapping of polypeptides with underlying interactions between amino acids and provides a quick and precise predictive framework for directing the construction of amyloid-associated functional materials.
Collapse
Affiliation(s)
- Zuojun Yang
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Yuhan Wu
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Hao Liu
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Li He
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Xiaoyuan Deng
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| |
Collapse
|
6
|
Liao S, Zhang Y, Han X, Wang T, Wang X, Yan Q, Li Q, Qi Y, Zhang Z. A sequence-based model for identifying proteins undergoing liquid-liquid phase separation/forming fibril aggregates via machine learning. Protein Sci 2024; 33:e4927. [PMID: 38380794 PMCID: PMC10880426 DOI: 10.1002/pro.4927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 01/27/2024] [Accepted: 01/30/2024] [Indexed: 02/22/2024]
Abstract
Liquid-liquid phase separation (LLPS) and the solid aggregate (also referred to as amyloid aggregates) formation of proteins, have gained significant attention in recent years due to their associations with various physiological and pathological processes in living organisms. The systematic investigation of the differences and connections between proteins undergoing LLPS and those forming amyloid fibrils at the sequence level has not yet been explored. In this research, we aim to address this gap by comparing the two types of proteins across 36 features using collected data available currently. The statistical comparison results indicate that, 24 of the selected 36 features exhibit significant difference between the two protein groups. A LLPS-Fibrils binary classification model built on these 24 features using random forest reveals that the fraction of intrinsically disordered residues (FIDR ) is identified as the most crucial feature. While, in the further three-class LLPS-Fibrils-Background classification model built on the same screened features, the composition of cysteine and that of leucine show more significant contributions than others. Through feature ablation analysis, we finally constructed a model FLFB (Feature-based LLPS-Fibrils-Background protein predictor) using six refined features, with an average area under the receiver operating characteristics of 0.83. This work indicates using sequence features and a machine learning model, proteins undergoing LLPS or forming amyloid fibrils can be identified.
Collapse
Affiliation(s)
- Shaofeng Liao
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Yujun Zhang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Xinchen Han
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Tinglan Wang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Xi Wang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Qinglin Yan
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Qian Li
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Yifei Qi
- School of PharmacyFudan UniversityShanghaiChina
| | - Zhuqing Zhang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| |
Collapse
|
7
|
Kang S, Kim M, Sun J, Lee M, Min K. Prediction of Protein Aggregation Propensity via Data-Driven Approaches. ACS Biomater Sci Eng 2023; 9:6451-6463. [PMID: 37844262 DOI: 10.1021/acsbiomaterials.3c01001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]
Abstract
Protein aggregation occurs when misfolded or unfolded proteins physically bind together and can promote the development of various amyloid diseases. This study aimed to construct surrogate models for predicting protein aggregation via data-driven methods using two types of databases. First, an aggregation propensity score database was constructed by calculating the scores for protein structures in the Protein Data Bank using Aggrescan3D 2.0. Moreover, feature- and graph-based models for predicting protein aggregation have been developed by using this database. The graph-based model outperformed the feature-based model, resulting in an R2 of 0.95, although it intrinsically required protein structures. Second, for the experimental data, a feature-based model was built using the Curated Protein Aggregation Database 2.0 to predict the aggregated intensity curves. In summary, this study suggests approaches that are more effective in predicting protein aggregation, depending on the type of descriptor and the database.
Collapse
Affiliation(s)
- Seungpyo Kang
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| | - Minseon Kim
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| | - Jiwon Sun
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| | - Myeonghun Lee
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| | - Kyoungmin Min
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| |
Collapse
|
8
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
9
|
Yu Z, Yin Z, Zou H. iAMY-RECMFF: Identifying amyloidgenic peptides by using residue pairwise energy content matrix and features fusion algorithm. J Bioinform Comput Biol 2023; 21:2350023. [PMID: 37899353 DOI: 10.1142/s0219720023500233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
Various diseases, including Huntington's disease, Alzheimer's disease, and Parkinson's disease, have been reported to be linked to amyloid. Therefore, it is crucial to distinguish amyloid from non-amyloid proteins or peptides. While experimental approaches are typically preferred, they are costly and time-consuming. In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector machine for identifying amyloidgenic peptides. Experimental results demonstrate that our proposed method significantly improves the identification of amyloidgenic peptides compared to existing predictors. This suggests that our method may serve as a powerful tool in identifying amyloidgenic peptides. To facilitate academic use, the dataset and codes used in the current study are accessible at https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916.
Collapse
Affiliation(s)
- Zizheng Yu
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
| | - Zhijian Yin
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
- Jiangxi Engineering Research Center of Unattended Perception System and Artificial Intelligence Technology Jiangxi Science and Technology Normal University, Jiangxi 330088, P. R. China
| | - Hongliang Zou
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
- Jiangxi Engineering Research Center of Unattended Perception System and Artificial Intelligence Technology Jiangxi Science and Technology Normal University, Jiangxi 330088, P. R. China
| |
Collapse
|
10
|
Manyilov VD, Ilyinsky NS, Nesterov SV, Saqr BMGA, Dayhoff GW, Zinovev EV, Matrenok SS, Fonin AV, Kuznetsova IM, Turoverov KK, Ivanovich V, Uversky VN. Chaotic aging: intrinsically disordered proteins in aging-related processes. Cell Mol Life Sci 2023; 80:269. [PMID: 37634152 PMCID: PMC11073068 DOI: 10.1007/s00018-023-04897-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 07/03/2023] [Accepted: 07/24/2023] [Indexed: 08/29/2023]
Abstract
The development of aging is associated with the disruption of key cellular processes manifested as well-established hallmarks of aging. Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) have no stable tertiary structure that provide them a power to be configurable hubs in signaling cascades and regulate many processes, potentially including those related to aging. There is a need to clarify the roles of IDPs/IDRs in aging. The dataset of 1702 aging-related proteins was collected from established aging databases and experimental studies. There is a noticeable presence of IDPs/IDRs, accounting for about 36% of the aging-related dataset, which is however less than the disorder content of the whole human proteome (about 40%). A Gene Ontology analysis of the used here aging proteome reveals an abundance of IDPs/IDRs in one-third of aging-associated processes, especially in genome regulation. Signaling pathways associated with aging also contain IDPs/IDRs on different hierarchical levels, revealing the importance of "structure-function continuum" in aging. Protein-protein interaction network analysis showed that IDPs present in different clusters associated with different aging hallmarks. Protein cluster with IDPs enrichment has simultaneously high liquid-liquid phase separation (LLPS) probability, "nuclear" localization and DNA-associated functions, related to aging hallmarks: genomic instability, telomere attrition, epigenetic alterations, and stem cells exhaustion. Intrinsic disorder, LLPS, and aggregation propensity should be considered as features that could be markers of pathogenic proteins. Overall, our analyses indicate that IDPs/IDRs play significant roles in aging-associated processes, particularly in the regulation of DNA functioning. IDP aggregation, which can lead to loss of function and toxicity, could be critically harmful to the cell. A structure-based analysis of aging and the identification of proteins that are particularly susceptible to disturbances can enhance our understanding of the molecular mechanisms of aging and open up new avenues for slowing it down.
Collapse
Affiliation(s)
- Vladimir D Manyilov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Nikolay S Ilyinsky
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia.
| | - Semen V Nesterov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
- Institute of Cytology, Russian Academy of Sciences, Saint Petersburg, 194064, Russia
| | - Baraa M G A Saqr
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Guy W Dayhoff
- Department of Chemistry, University of South Florida, Tampa, FL, USA
| | - Egor V Zinovev
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Simon S Matrenok
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Alexander V Fonin
- Institute of Cytology, Russian Academy of Sciences, Saint Petersburg, 194064, Russia
| | - Irina M Kuznetsova
- Institute of Cytology, Russian Academy of Sciences, Saint Petersburg, 194064, Russia
| | | | - Valentin Ivanovich
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Vladimir N Uversky
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia.
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd., MDC07, Tampa, FL, 33612, USA.
| |
Collapse
|
11
|
Kell DB, Pretorius E. Are fibrinaloid microclots a cause of autoimmunity in Long Covid and other post-infection diseases? Biochem J 2023; 480:1217-1240. [PMID: 37584410 DOI: 10.1042/bcj20230241] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 08/03/2023] [Accepted: 08/07/2023] [Indexed: 08/17/2023]
Abstract
It is now well established that the blood-clotting protein fibrinogen can polymerise into an anomalous form of fibrin that is amyloid in character; the resultant clots and microclots entrap many other molecules, stain with fluorogenic amyloid stains, are rather resistant to fibrinolysis, can block up microcapillaries, are implicated in a variety of diseases including Long COVID, and have been referred to as fibrinaloids. A necessary corollary of this anomalous polymerisation is the generation of novel epitopes in proteins that would normally be seen as 'self', and otherwise immunologically silent. The precise conformation of the resulting fibrinaloid clots (that, as with prions and classical amyloid proteins, can adopt multiple, stable conformations) must depend on the existing small molecules and metal ions that the fibrinogen may (and is some cases is known to) have bound before polymerisation. Any such novel epitopes, however, are likely to lead to the generation of autoantibodies. A convergent phenomenology, including distinct conformations and seeding of the anomalous form for initiation and propagation, is emerging to link knowledge in prions, prionoids, amyloids and now fibrinaloids. We here summarise the evidence for the above reasoning, which has substantial implications for our understanding of the genesis of autoimmunity (and the possible prevention thereof) based on the primary process of fibrinaloid formation.
Collapse
Affiliation(s)
- Douglas B Kell
- Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Liverpool L69 7ZB, U.K
- The Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Kemitorvet 200, 2800 Kgs Lyngby, Denmark
- Department of Physiological Sciences, Faculty of Science, Stellenbosch University, Private Bag X1 Matieland, Stellenbosch 7602, South Africa
| | - Etheresia Pretorius
- Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Liverpool L69 7ZB, U.K
- Department of Physiological Sciences, Faculty of Science, Stellenbosch University, Private Bag X1 Matieland, Stellenbosch 7602, South Africa
| |
Collapse
|
12
|
Machine Learning Approaches in Diagnosis, Prognosis and Treatment Selection of Cardiac Amyloidosis. Int J Mol Sci 2023; 24:ijms24065680. [PMID: 36982754 PMCID: PMC10051237 DOI: 10.3390/ijms24065680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/12/2023] [Accepted: 03/14/2023] [Indexed: 03/18/2023] Open
Abstract
Cardiac amyloidosis is an uncommon restrictive cardiomyopathy featuring an unregulated amyloid protein deposition that impairs organic function. Early cardiac amyloidosis diagnosis is generally delayed by indistinguishable clinical findings of more frequent hypertrophic diseases. Furthermore, amyloidosis is divided into various groups, according to a generally accepted taxonomy, based on the proteins that make up the amyloid deposits; a careful differentiation between the various forms of amyloidosis is necessary to undertake an adequate therapeutic treatment. Thus, cardiac amyloidosis is thought to be underdiagnosed, which delays necessary therapeutic procedures, diminishing quality of life and impairing clinical prognosis. The diagnostic work-up for cardiac amyloidosis begins with the identification of clinical features, electrocardiographic and imaging findings suggestive or compatible with cardiac amyloidosis, and often requires the histological demonstration of amyloid deposition. One approach to overcome the difficulty of an early diagnosis is the use of automated diagnostic algorithms. Machine learning enables the automatic extraction of salient information from “raw data” without the need for pre-processing methods based on the a priori knowledge of the human operator. This review attempts to assess the various diagnostic approaches and artificial intelligence computational techniques in the detection of cardiac amyloidosis.
Collapse
|
13
|
Burdukiewicz M, Rafacz D, Barbach A, Hubicka K, Bąkała L, Lassota A, Stecko J, Szymańska N, Wojciechowski J, Kozakiewicz D, Szulc N, Chilimoniuk J, Jęśkowiak I, Gąsior-Głogowska M, Kotulska M. AmyloGraph: a comprehensive database of amyloid-amyloid interactions. Nucleic Acids Res 2022; 51:D352-D357. [PMID: 36243982 PMCID: PMC9825533 DOI: 10.1093/nar/gkac882] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/22/2022] [Accepted: 09/30/2022] [Indexed: 01/29/2023] Open
Abstract
Information about the impact of interactions between amyloid proteins on their fibrillization propensity is scattered among many experimental articles and presented in unstructured form. We manually curated information located in almost 200 publications (selected out of 562 initially considered), obtaining details of 883 experimentally studied interactions between 46 amyloid proteins or peptides. We also proposed a novel standardized terminology for the description of amyloid-amyloid interactions, which is included in our database, covering all currently known types of such a cross-talk, including inhibition of fibrillization, cross-seeding and other phenomena. The new approach allows for more specific studies on amyloids and their interactions, by providing very well-defined data. AmyloGraph, an online database presenting information on amyloid-amyloid interactions, is available at (http://AmyloGraph.com/). Its functionalities are also accessible as the R package (https://github.com/KotulskaLab/AmyloGraph). AmyloGraph is the only publicly available repository for experimentally determined amyloid-amyloid interactions.
Collapse
Affiliation(s)
| | - Dominik Rafacz
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
| | - Agnieszka Barbach
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
| | - Katarzyna Hubicka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
| | - Laura Bąkała
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
| | - Anna Lassota
- School of Biosciences, College of Life and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Jakub Stecko
- Faculty of Medicine, Wrocław Medical University, Ludwika Pasteura 1, 50-367 Wrocław, Poland
| | - Natalia Szymańska
- Faculty of Medicine, Wrocław Medical University, Ludwika Pasteura 1, 50-367 Wrocław, Poland
| | - Jakub W Wojciechowski
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
| | - Dominika Kozakiewicz
- Laboratory of Microbiome Immunobiology, Hirszfeld Institute of Immunology and Experimental Therapy, Polish Academy of Sciences, Weigla 12, 53-114 Wrocław, Poland
| | - Natalia Szulc
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
| | - Jarosław Chilimoniuk
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Fryderyka Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Izabela Jęśkowiak
- Department of Pharmacology, Wroclaw Medical University, Mikulicza-Radeckiego 2, 50-345 Wrocław, Poland
| | - Marlena Gąsior-Głogowska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
| | - Małgorzata Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
| |
Collapse
|
14
|
Charoenkwan P, Ahmed S, Nantasenamat C, Quinn JMW, Moni MA, Lio' P, Shoombuatong W. AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning. Sci Rep 2022; 12:7697. [PMID: 35546347 PMCID: PMC9095707 DOI: 10.1038/s41598-022-11897-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 05/03/2022] [Indexed: 12/13/2022] Open
Abstract
Amyloid proteins have the ability to form insoluble fibril aggregates that have important pathogenic effects in many tissues. Such amyloidoses are prominently associated with common diseases such as type 2 diabetes, Alzheimer's disease, and Parkinson's disease. There are many types of amyloid proteins, and some proteins that form amyloid aggregates when in a misfolded state. It is difficult to identify such amyloid proteins and their pathogenic properties, but a new and effective approach is by developing effective bioinformatics tools. While several machine learning (ML)-based models for in silico identification of amyloid proteins have been proposed, their predictive performance is limited. In this study, we present AMYPred-FRL, a novel meta-predictor that uses a feature representation learning approach to achieve more accurate amyloid protein identification. AMYPred-FRL combined six well-known ML algorithms (extremely randomized tree, extreme gradient boosting, k-nearest neighbor, logistic regression, random forest, and support vector machine) with ten different sequence-based feature descriptors to generate 60 probabilistic features (PFs), as opposed to state-of-the-art methods developed by a single feature-based approach. A logistic regression recursive feature elimination (LR-RFE) method was used to find the optimal m number of 60 PFs in order to improve the predictive performance. Finally, using the meta-predictor approach, the 20 selected PFs were fed into a logistic regression method to create the final hybrid model (AMYPred-FRL). Both cross-validation and independent tests showed that AMYPred-FRL achieved superior predictive performance than its constituent baseline models. In an extensive independent test, AMYPred-FRL outperformed the existing methods by 5.5% and 16.1%, respectively, with accuracy and MCC of 0.873 and 0.710. To expedite high-throughput prediction, a user-friendly web server of AMYPred-FRL is freely available at http://pmlabstack.pythonanywhere.com/AMYPred-FRL. It is anticipated that AMYPred-FRL will be a useful tool in helping researchers to identify new amyloid proteins.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Saeed Ahmed
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia
| | - Mohammad Ali Moni
- Artificial Intelligence and Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
15
|
Louros N, van der Kant R, Schymkowitz J, Rousseau F. StAmP-DB: A platform for structures of polymorphic amyloid fibril cores. Bioinformatics 2022; 38:2636-2638. [PMID: 35199146 DOI: 10.1093/bioinformatics/btac126] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 01/20/2022] [Accepted: 02/22/2022] [Indexed: 11/12/2022] Open
Abstract
SUMMARY Amyloid polymorphism is emerging as a key property that is differentially linked to various conformational diseases, including major neurodegenerative disorders, but also as a feature that potentially relates to complex structural mechanisms mediating transmissibility barriers and selective vulnerability of amyloids. In response to the rapidly expanding number of amyloid fibril structures formed by full-length proteins, we here have developed StAmP-DB, a public database that supports the curation and cross-comparison of experimentally determined three-dimensional amyloid polymorph structures. AVAILABILITY StAmP-DB is freely accessible for queries and downloads at https://stamp.switchlab.org.
Collapse
Affiliation(s)
- Nikolaos Louros
- Switch Laboratory, VIB-KU Leuven Center for Brain & Disease Research, Herestraat 49, Leuven, 3000, Belgium.,Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, Leuven, box 802, 3000, Belgium
| | - Rob van der Kant
- Switch Laboratory, VIB-KU Leuven Center for Brain & Disease Research, Herestraat 49, Leuven, 3000, Belgium.,Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, Leuven, box 802, 3000, Belgium
| | - Joost Schymkowitz
- Switch Laboratory, VIB-KU Leuven Center for Brain & Disease Research, Herestraat 49, Leuven, 3000, Belgium.,Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, Leuven, box 802, 3000, Belgium
| | - Frederic Rousseau
- Switch Laboratory, VIB-KU Leuven Center for Brain & Disease Research, Herestraat 49, Leuven, 3000, Belgium.,Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, Leuven, box 802, 3000, Belgium
| |
Collapse
|
16
|
Bioinformatics Methods in Predicting Amyloid Propensity of Peptides and Proteins. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2340:1-15. [PMID: 35167067 DOI: 10.1007/978-1-0716-1546-1_1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Several computational methods have been developed to predict amyloid propensity of a protein or peptide. These bioinformatics tools are time- and cost-saving alternatives to expensive and laborious experimental methods which are used to confirm self-aggregation of a protein. Computational approaches not only allow preselection of reliable candidates for amyloids but, most importantly, are capable of a thorough and informative analysis of a protein, indicating the sequence determinants of protein aggregation, identifying the potential causal mutations and likely mechanisms. Bioinformatics modeling applies several different approaches, which most typically include physicochemical or structure-based modeling, machine learning, or statistics based modeling. Bioinformatics methods typically use the amino acid sequence of a protein as an input, some also include additional information, for example, an available structure. This chapter describes the methods currently used to computationally predict amyloid propensity of a protein or peptide. Since the accuracy of bioinformatics methods may be highly dependent on reference data used to develop and evaluate the predictors, we also briefly present the main databases of amyloids used by the authors of bioinformatics tools.
Collapse
|
17
|
Lai PK, Gallegos A, Mody N, Sathish HA, Trout BL. Machine learning prediction of antibody aggregation and viscosity for high concentration formulation development of protein therapeutics. MAbs 2022; 14:2026208. [PMID: 35075980 PMCID: PMC8794240 DOI: 10.1080/19420862.2022.2026208] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Machine learning has been recently used to predict therapeutic antibody aggregation rates and viscosity at high concentrations (150 mg/ml). These works focused on commercially available antibodies, which may have been optimized for stability. In this study, we measured accelerated aggregation rates at 45°C and viscosity at 150 mg/ml for 20 preclinical and clinical-stage antibodies. Features obtained from molecular dynamics simulations of the full-length antibody and sequences were used for machine learning model construction. We found a k-nearest neighbors regression model with two features, spatial positive charge map on the CDRH2 and solvent-accessible surface area of hydrophobic residues on the variable fragment, gives the best performance for predicting antibody aggregation rates (r = 0.89). For the viscosity classification model, the model with the highest accuracy is a logistic regression model with two features, spatial negative charge map on the heavy chain variable region and spatial negative charge map on the light chain variable region. The accuracy and the area under precision recall curve of the classification model from validation tests are 0.86 and 0.70, respectively. In addition, we combined data from another 27 commercial mAbs to develop a viscosity predictive model. The best model is a logistic regression model with two features, number of hydrophobic residues on the light chain variable region and net charges on the light chain variable region. The accuracy and the area under precision recall curve of the classification model are 0.85 and 0.6, respectively. The aggregation rates and viscosity models can be used to predict antibody stability to facilitate pharmaceutical development.
Collapse
Affiliation(s)
- Pin-Kuang Lai
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.,Department of Chemical Engineering and Materials Science, Stevens Institute of Technology, Hoboken, New Jersey, USA
| | - Austin Gallegos
- Dosage Form Design and Development, AstraZeneca, Gaithersburg, Maryland, USA
| | - Neil Mody
- Dosage Form Design and Development, AstraZeneca, Gaithersburg, Maryland, USA
| | - Hasige A Sathish
- Dosage Form Design and Development, AstraZeneca, Gaithersburg, Maryland, USA
| | - Bernhardt L Trout
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
18
|
Akbar R, Bashour H, Rawat P, Robert PA, Smorodina E, Cotet TS, Flem-Karlsen K, Frank R, Mehta BB, Vu MH, Zengin T, Gutierrez-Marcos J, Lund-Johansen F, Andersen JT, Greiff V. Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies. MAbs 2022; 14:2008790. [PMID: 35293269 PMCID: PMC8928824 DOI: 10.1080/19420862.2021.2008790] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 11/04/2021] [Accepted: 11/17/2021] [Indexed: 12/15/2022] Open
Abstract
Although the therapeutic efficacy and commercial success of monoclonal antibodies (mAbs) are tremendous, the design and discovery of new candidates remain a time and cost-intensive endeavor. In this regard, progress in the generation of data describing antigen binding and developability, computational methodology, and artificial intelligence may pave the way for a new era of in silico on-demand immunotherapeutics design and discovery. Here, we argue that the main necessary machine learning (ML) components for an in silico mAb sequence generator are: understanding of the rules of mAb-antigen binding, capacity to modularly combine mAb design parameters, and algorithms for unconstrained parameter-driven in silico mAb sequence synthesis. We review the current progress toward the realization of these necessary components and discuss the challenges that must be overcome to allow the on-demand ML-based discovery and design of fit-for-purpose mAb therapeutic candidates.
Collapse
Affiliation(s)
- Rahmad Akbar
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Habib Bashour
- School of Life Sciences, University of Warwick, Coventry, UK
| | - Puneet Rawat
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - Philippe A. Robert
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Eva Smorodina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russia
| | | | - Karine Flem-Karlsen
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
- Institute of Clinical Medicine, Department of Pharmacology, University of Oslo and Oslo University Hospital, Norway
| | - Robert Frank
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Brij Bhushan Mehta
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Mai Ha Vu
- Department of Linguistics and Scandinavian Studies, University of Oslo, Norway
| | - Talip Zengin
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
- Department of Bioinformatics, Mugla Sitki Kocman University, Turkey
| | | | | | - Jan Terje Andersen
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
- Institute of Clinical Medicine, Department of Pharmacology, University of Oslo and Oslo University Hospital, Norway
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
19
|
Dasmeh P, Wagner A. Yeast Proteins may Reversibly Aggregate like Amphiphilic Molecules. J Mol Biol 2021; 434:167352. [PMID: 34774567 DOI: 10.1016/j.jmb.2021.167352] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 10/18/2021] [Accepted: 11/07/2021] [Indexed: 11/30/2022]
Abstract
More than a hundred proteins in yeast reversibly aggregate and phase-separate in response to various stressors, such as nutrient depletion and heat shock. We know little about the protein sequence and structural features behind this ability, which has not been characterized on a proteome-wide level. To identify the distinctive features of aggregation-prone protein regions, we apply machine learning algorithms to genome-scale limited proteolysis-mass spectrometry (LiP-MS) data from yeast proteins. LiP-MS data reveals that 96 proteins show significant structural changes upon heat shock. We find that in these proteins the propensity to phase separate cannot be solely driven by disordered regions, because their aggregation-prone regions (APRs) are not significantly disordered. Instead, the phase separation of these proteins requires contributions from both disordered and structured regions. APRs are significantly enriched in aliphatic residues and depleted in positively charged amino acids. Aggregator proteins with longer APRs show a greater propensity to aggregate, a relationship that can be explained by equilibrium statistical thermodynamics. Altogether, our observations suggest that proteome-wide reversible protein aggregation is mediated by sequence-encoded properties. We propose that aggregating proteins resemble supra-molecular amphiphiles, where APRs are the hydrophobic parts, and non-APRs are the hydrophilic parts.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Institute for Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02139, USA; Swiss Institute of Bioinformatics (SIB), Switzerland.
| | - Andreas Wagner
- Institute for Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland; The Santa Fe Institute, Santa Fe, NM, USA; Swiss Institute of Bioinformatics (SIB), Switzerland; Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, Stellenbosch 7600, South Africa.
| |
Collapse
|
20
|
Rawat P, Prabakaran R, Kumar S, Gromiha MM. Exploring the sequence features determining amyloidosis in human antibody light chains. Sci Rep 2021; 11:13785. [PMID: 34215782 PMCID: PMC8253744 DOI: 10.1038/s41598-021-93019-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 06/18/2021] [Indexed: 02/06/2023] Open
Abstract
The light chain (AL) amyloidosis is caused by the aggregation of light chain of antibodies into amyloid fibrils. There are plenty of computational resources available for the prediction of short aggregation-prone regions within proteins. However, it is still a challenging task to predict the amyloidogenic nature of the whole protein using sequence/structure information. In the case of antibody light chains, common architecture and known binding sites can provide vital information for the prediction of amyloidogenicity at physiological conditions. Here, in this work, we have compared classical sequence-based, aggregation-related features (such as hydrophobicity, presence of gatekeeper residues, disorderness, β-propensity, etc.) calculated for the CDR, FR or VL regions of amyloidogenic and non-amyloidogenic antibody light chains and implemented the insights gained in a machine learning-based webserver called "VLAmY-Pred" ( https://web.iitm.ac.in/bioinfo2/vlamy-pred/ ). The model shows prediction accuracy of 79.7% (sensitivity: 78.7% and specificity: 79.9%) with a ROC value of 0.88 on a dataset of 1828 variable region sequences of the antibody light chains. This model will be helpful towards improved prognosis for patients that may likely suffer from diseases caused by light chain amyloidosis, understanding origins of aggregation in antibody-based biotherapeutics, large-scale in-silico analysis of antibody sequences generated by next generation sequencing, and finally towards rational engineering of aggregation resistant antibodies.
Collapse
Affiliation(s)
- Puneet Rawat
- grid.417969.40000 0001 2315 1926Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036 Tamil Nadu India
| | - R. Prabakaran
- grid.417969.40000 0001 2315 1926Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036 Tamil Nadu India
| | - Sandeep Kumar
- grid.418412.a0000 0001 1312 9717Biotherapeutics Discovery, Boehringer-Ingelheim Inc., 5571 R & D Building, 175 Briar Ridge Road, Ridgefield, CT 06877 USA
| | - M. Michael Gromiha
- grid.417969.40000 0001 2315 1926Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036 Tamil Nadu India ,grid.32197.3e0000 0001 2179 2105Advanced Computational Drug Discovery Unit (ACDD), Institute of Innovative Research, Tokyo Institute of Technology, 4259 Nagatsutacho, Midori-ku, Yokohama, Kanagawa 226-8501 Japan
| |
Collapse
|
21
|
Prabakaran R, Rawat P, Kumar S, Gromiha MM. Evaluation of in silico tools for the prediction of protein and peptide aggregation on diverse datasets. Brief Bioinform 2021; 22:6309925. [PMID: 34181000 DOI: 10.1093/bib/bbab240] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/18/2021] [Accepted: 06/02/2021] [Indexed: 01/09/2023] Open
Abstract
Several prediction algorithms and tools have been developed in the last two decades to predict protein and peptide aggregation. These in silico tools aid to predict the aggregation propensity and amyloidogenicity as well as the identification of aggregation-prone regions. Despite the immense interest in the field, it is of prime importance to systematically compare these algorithms for their performance. In this review, we have provided a rigorous performance analysis of nine prediction tools using a variety of assessments. The assessments were carried out on several non-redundant datasets ranging from hexapeptides to protein sequences as well as amyloidogenic antibody light chains to soluble protein sequences. Our analysis reveals the robustness of the current prediction tools and the scope for improvement in their predictive performances. Insights gained from this work provide critical guidance to the scientific community on advantages and limitations of different aggregation prediction methods and make informed decisions about their research needs.
Collapse
Affiliation(s)
| | | | - Sandeep Kumar
- Department of Biotherapeutics Discovery in Boehringer-Ingelheim Pharmaceutical Inc., Ridgefield, CT, USA
| | | |
Collapse
|
22
|
AbsoluRATE: An in-silico method to predict the aggregation kinetics of native proteins. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2021; 1869:140682. [PMID: 34102324 DOI: 10.1016/j.bbapap.2021.140682] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/12/2021] [Accepted: 06/04/2021] [Indexed: 12/12/2022]
Abstract
Protein aggregation has two aspects, namely, mechanistic and kinetics. Understanding protein aggregation kinetics is critical for prediction of progression of diseases caused by amyloidosis, accumulation of aggregates in biotherapeutics during storage and engineering commercial nano-biomaterials. In this work, we have collected experimentally determined absolute protein aggregation rates and developed an SVM based regression model to predict absolute rates of protein and peptide aggregation near-physiological conditions. The regression model achieved a correlation coefficient of 0.72 with MAE of 0.91 (natural log of kapp, where kapp is in hour-1) using leave-one-out cross-validation on a dataset of 82 non-redundant proteins/peptides. The model accounts for the experimental conditions (such as temperature, pH, ionic and protein concentration) and sequence-based properties. The amino acid sequence features revealed by this model as being important for aggregation kinetics, are also associated with the aggregation mechanism. In particular, inherent aggregation propensity of the protein/peptide sequence and number of aggregation prone regions (APRs) unpunctuated by the gatekeeping residues, were found to play important roles in the prediction of the absolute aggregation rates. This analysis shows that mechanism and kinetics of protein aggregation are coupled via common sequence attributes. The aggregation kinetic prediction method developed in this work is available at https://web.iitm.ac.in/bioinfo2/absolurate-pred/index.html.
Collapse
|
23
|
Ptak-Kaczor M, Banach M, Stapor K, Fabian P, Konieczny L, Roterman I. Solubility and Aggregation of Selected Proteins Interpreted on the Basis of Hydrophobicity Distribution. Int J Mol Sci 2021; 22:ijms22095002. [PMID: 34066830 PMCID: PMC8125953 DOI: 10.3390/ijms22095002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 05/03/2021] [Accepted: 05/06/2021] [Indexed: 11/30/2022] Open
Abstract
Protein solubility is based on the compatibility of the specific protein surface with the polar aquatic environment. The exposure of polar residues to the protein surface promotes the protein’s solubility in the polar environment. The aquatic environment also influences the folding process by favoring the centralization of hydrophobic residues with the simultaneous exposure to polar residues. The degree of compatibility of the residue distribution, with the model of the concentration of hydrophobic residues in the center of the molecule, with the simultaneous exposure of polar residues is determined by the sequence of amino acids in the chain. The fuzzy oil drop model enables the quantification of the degree of compatibility of the hydrophobicity distribution observed in the protein to a form fully consistent with the Gaussian 3D function, which expresses an idealized distribution that meets the preferences of the polar water environment. The varied degrees of compatibility of the distribution observed with the idealized one allow the prediction of preferences to interactions with molecules of different polarity, including water molecules in particular. This paper analyzes a set of proteins with different levels of hydrophobicity distribution in the context of the solubility of a given protein and the possibility of complex formation.
Collapse
Affiliation(s)
- Magdalena Ptak-Kaczor
- Department of Bioinformatics and Telemedicine, Jagiellonian University—Medical College, Medyczna 7, 30-688 Kraków, Poland; (M.P.-K.); (M.B.)
- Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Łojasiewicza 11, 30-348 Kraków, Poland
| | - Mateusz Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University—Medical College, Medyczna 7, 30-688 Kraków, Poland; (M.P.-K.); (M.B.)
| | - Katarzyna Stapor
- Institute of Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (K.S.); (P.F.)
| | - Piotr Fabian
- Institute of Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (K.S.); (P.F.)
| | - Leszek Konieczny
- Chair of Medical Biochemistry—Jagiellonian University—Medical College, Kopernika 7, 31-034 Kraków, Poland;
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University—Medical College, Medyczna 7, 30-688 Kraków, Poland; (M.P.-K.); (M.B.)
- Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Łojasiewicza 11, 30-348 Kraków, Poland
- Correspondence:
| |
Collapse
|
24
|
Prabakaran R, Rawat P, Thangakani AM, Kumar S, Gromiha MM. Protein aggregation: in silico algorithms and applications. Biophys Rev 2021; 13:71-89. [PMID: 33747245 PMCID: PMC7930180 DOI: 10.1007/s12551-021-00778-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/01/2021] [Indexed: 01/08/2023] Open
Abstract
Protein aggregation is a topic of immense interest to the scientific community due to its role in several neurodegenerative diseases/disorders and industrial importance. Several in silico techniques, tools, and algorithms have been developed to predict aggregation in proteins and understand the aggregation mechanisms. This review attempts to provide an essence of the vast developments in in silico approaches, resources available, and future perspectives. It reviews aggregation-related databases, mechanistic models (aggregation-prone region and aggregation propensity prediction), kinetic models (aggregation rate prediction), and molecular dynamics studies related to aggregation. With a multitude of prediction models related to aggregation already available to the scientific community, the field of protein aggregation is rapidly maturing to tackle new applications.
Collapse
Affiliation(s)
- R. Prabakaran
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu India
| | - Puneet Rawat
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu India
| | - A. Mary Thangakani
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu India
| | - Sandeep Kumar
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceutical Inc., Ridgefield, CT USA
| | - M. Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu India
- School of Computing, Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Kanagawa Japan
| |
Collapse
|
25
|
Prabakaran R, Rawat P, Kumar S, Michael Gromiha M. ANuPP: A Versatile Tool to Predict Aggregation Nucleating Regions in Peptides and Proteins. J Mol Biol 2020; 433:166707. [PMID: 33972019 DOI: 10.1016/j.jmb.2020.11.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 10/28/2020] [Accepted: 11/05/2020] [Indexed: 12/22/2022]
Abstract
Short aggregation prone sequence motifs can trigger aggregation in peptide and protein sequences. Most algorithms developed so far to identify potential aggregation prone regions (APRs) use amino acid residue composition and/or sequence pattern features. In this work, we have investigated the importance of atomic-level characteristics rather than residue level to understand the initiation of aggregation in proteins and peptides. Using atomic-level features an ensemble-classifier, ANuPP has been developed to predict the aggregation-nucleating regions in peptides and proteins. In a dataset of 1279 hexapeptides, ANuPP achieved an area under the curve (AUC) of 0.831 with 77% accuracy on 10-fold cross-validation and an AUC of 0.883 with 83% accuracy in a blind test dataset of 142 hexapeptides. Further, it showed an average SOV of 48.7% on identifying APR regions in 37 proteins. The performance of ANuPP is better than other methods reported in the literature on both amyloidogenic hexapeptide prediction and APR identification. We have developed a web server for ANuPP and it is available at https://web.iitm.ac.in/bioinfo2/ANuPP/. Insights gained from this work demonstrate the importance of atomic and functional group characteristics towards diversity of atomic level origins as well as mechanisms of protein aggregation.
Collapse
Affiliation(s)
- R Prabakaran
- Protein Bioinformatics Lab, Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - Puneet Rawat
- Protein Bioinformatics Lab, Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - Sandeep Kumar
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceutical Inc., Ridgefield, CT, USA.
| | - M Michael Gromiha
- Protein Bioinformatics Lab, Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India; School of Computing, Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan.
| |
Collapse
|