1
|
Weckbecker M, Anžel A, Yang Z, Hattab G. Interpretable molecular encodings and representations for machine learning tasks. Comput Struct Biotechnol J 2024; 23:2326-2336. [PMID: 38867722 PMCID: PMC11167246 DOI: 10.1016/j.csbj.2024.05.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 06/14/2024] Open
Abstract
Molecular encodings and their usage in machine learning models have demonstrated significant breakthroughs in biomedical applications, particularly in the classification of peptides and proteins. To this end, we propose a new encoding method: Interpretable Carbon-based Array of Neighborhoods (iCAN). Designed to address machine learning models' need for more structured and less flexible input, it captures the neighborhoods of carbon atoms in a counting array and improves the utility of the resulting encodings for machine learning models. The iCAN method provides interpretable molecular encodings and representations, enabling the comparison of molecular neighborhoods, identification of repeating patterns, and visualization of relevance heat maps for a given data set. When reproducing a large biomedical peptide classification study, it outperforms its predecessor encoding. When extended to proteins, it outperforms a lead structure-based encoding on 71% of the data sets. Our method offers interpretable encodings that can be applied to all organic molecules, including exotic amino acids, cyclic peptides, and larger proteins, making it highly versatile across various domains and data sets. This work establishes a promising new direction for machine learning in peptide and protein classification in biomedicine and healthcare, potentially accelerating advances in drug discovery and disease diagnosis.
Collapse
Affiliation(s)
- Moritz Weckbecker
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Aleksandar Anžel
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Zewen Yang
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
| | - Georges Hattab
- Center for Artificial Intelligence in Public Health Research, (ZKI-PH), Robert Koch Institute, Nordufer 20, Berlin, 13353, Berlin, Germany
- Department of Mathematics and Computer science Freie Universität, Arnimallee 14, Berlin, 14195, Berlin, Germany
| |
Collapse
|
2
|
Manning MC, Holcomb RE, Payne RW, Stillahn JM, Connolly BD, Katayama DS, Liu H, Matsuura JE, Murphy BM, Henry CS, Crommelin DJA. Stability of Protein Pharmaceuticals: Recent Advances. Pharm Res 2024; 41:1301-1367. [PMID: 38937372 DOI: 10.1007/s11095-024-03726-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/03/2024] [Indexed: 06/29/2024]
Abstract
There have been significant advances in the formulation and stabilization of proteins in the liquid state over the past years since our previous review. Our mechanistic understanding of protein-excipient interactions has increased, allowing one to develop formulations in a more rational fashion. The field has moved towards more complex and challenging formulations, such as high concentration formulations to allow for subcutaneous administration and co-formulation. While much of the published work has focused on mAbs, the principles appear to apply to any therapeutic protein, although mAbs clearly have some distinctive features. In this review, we first discuss chemical degradation reactions. This is followed by a section on physical instability issues. Then, more specific topics are addressed: instability induced by interactions with interfaces, predictive methods for physical stability and interplay between chemical and physical instability. The final parts are devoted to discussions how all the above impacts (co-)formulation strategies, in particular for high protein concentration solutions.'
Collapse
Affiliation(s)
- Mark Cornell Manning
- Legacy BioDesign LLC, Johnstown, CO, USA.
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA.
| | - Ryan E Holcomb
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | - Robert W Payne
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | - Joshua M Stillahn
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | | | | | | | | | | | - Charles S Henry
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | | |
Collapse
|
3
|
Gonçalves AAM, Ribeiro AJ, Resende CAA, Couto CAP, Gandra IB, Dos Santos Barcelos IC, da Silva JO, Machado JM, Silva KA, Silva LS, Dos Santos M, da Silva Lopes L, de Faria MT, Pereira SP, Xavier SR, Aragão MM, Candida-Puma MA, de Oliveira ICM, Souza AA, Nogueira LM, da Paz MC, Coelho EAF, Giunchetti RC, de Freitas SM, Chávez-Fumagalli MA, Nagem RAP, Galdino AS. Recombinant multiepitope proteins expressed in Escherichia coli cells and their potential for immunodiagnosis. Microb Cell Fact 2024; 23:145. [PMID: 38778337 PMCID: PMC11110257 DOI: 10.1186/s12934-024-02418-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Recombinant multiepitope proteins (RMPs) are a promising alternative for application in diagnostic tests and, given their wide application in the most diverse diseases, this review article aims to survey the use of these antigens for diagnosis, as well as discuss the main points surrounding these antigens. RMPs usually consisting of linear, immunodominant, and phylogenetically conserved epitopes, has been applied in the experimental diagnosis of various human and animal diseases, such as leishmaniasis, brucellosis, cysticercosis, Chagas disease, hepatitis, leptospirosis, leprosy, filariasis, schistosomiasis, dengue, and COVID-19. The synthetic genes for these epitopes are joined to code a single RMP, either with spacers or fused, with different biochemical properties. The epitopes' high density within the RMPs contributes to a high degree of sensitivity and specificity. The RMPs can also sidestep the need for multiple peptide synthesis or multiple recombinant proteins, reducing costs and enhancing the standardization conditions for immunoassays. Methods such as bioinformatics and circular dichroism have been widely applied in the development of new RMPs, helping to guide their construction and better understand their structure. Several RMPs have been expressed, mainly using the Escherichia coli expression system, highlighting the importance of these cells in the biotechnological field. In fact, technological advances in this area, offering a wide range of different strains to be used, make these cells the most widely used expression platform. RMPs have been experimentally used to diagnose a broad range of illnesses in the laboratory, suggesting they could also be useful for accurate diagnoses commercially. On this point, the RMP method offers a tempting substitute for the production of promising antigens used to assemble commercial diagnostic kits.
Collapse
Affiliation(s)
- Ana Alice Maia Gonçalves
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Anna Julia Ribeiro
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Carlos Ananias Aparecido Resende
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Carolina Alves Petit Couto
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Isadora Braga Gandra
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Isabelle Caroline Dos Santos Barcelos
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Jonatas Oliveira da Silva
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Juliana Martins Machado
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Kamila Alves Silva
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Líria Souza Silva
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Michelli Dos Santos
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Lucas da Silva Lopes
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Mariana Teixeira de Faria
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Sabrina Paula Pereira
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Sandra Rodrigues Xavier
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Matheus Motta Aragão
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Mayron Antonio Candida-Puma
- Computational Biology and Chemistry Research Group, Vicerrectorado de Investigación, Universidad Católica de Santa María, Arequipa, 04000, Peru
| | | | - Amanda Araujo Souza
- Biophysics Laboratory, Institute of Biological Sciences, Department of Cell Biology, University of Brasilia, Brasília, 70910-900, Brazil
| | - Lais Moreira Nogueira
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Mariana Campos da Paz
- Bioactives and Nanobiotechnology Laboratory, Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil
| | - Eduardo Antônio Ferraz Coelho
- Postgraduate Program in Health Sciences, Infectious Diseases and Tropical Medicine, Faculty of Medicine, Federal University of Minas Gerais, Belo Horizonte, 30130-100, Brazil
| | - Rodolfo Cordeiro Giunchetti
- Laboratory of Biology of Cell Interactions, National Institute of Science and Technology on Tropical Diseases (INCT-DT), Department of Morphology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Sonia Maria de Freitas
- Biophysics Laboratory, Institute of Biological Sciences, Department of Cell Biology, University of Brasilia, Brasília, 70910-900, Brazil
| | - Miguel Angel Chávez-Fumagalli
- Computational Biology and Chemistry Research Group, Vicerrectorado de Investigación, Universidad Católica de Santa María, Arequipa, 04000, Peru
| | - Ronaldo Alves Pinto Nagem
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Alexsandro Sobreira Galdino
- Microorganism Biotechnology Laboratory, National Institute of Science and Technology on Industrial Biotechnology (INCT-BI), Federal University of São João Del-Rei, Midwest Campus, Divinópolis, 35501-296, Brazil.
| |
Collapse
|
4
|
Nordquist EB, Clerico EM, Chen J, Gierasch LM. Computationally-Aided Modeling of Hsp70-Client Interactions: Past, Present, and Future. J Phys Chem B 2022; 126:6780-6791. [PMID: 36040440 PMCID: PMC10309085 DOI: 10.1021/acs.jpcb.2c03806] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Hsp70 molecular chaperones play central roles in maintaining a healthy cellular proteome. Hsp70s function by binding to short peptide sequences in incompletely folded client proteins, thus preventing them from misfolding and/or aggregating, and in many cases holding them in a state that is competent for subsequent processes like translocation across membranes. There is considerable interest in predicting the sites where Hsp70s may bind their clients, as the ability to do so sheds light on the cellular functions of the chaperone. In addition, the capacity of the Hsp70 chaperone family to bind to a broad array of clients and to identify accessible sequences that enable discrimination of those that are folded from those that are not fully folded, which is essential to their cellular roles, is a fascinating puzzle in molecular recognition. In this article we discuss efforts to harness computational modeling with input from experimental data to develop a predictive understanding of the promiscuous yet selective binding of Hsp70 molecular chaperones to accessible sequences within their client proteins. We trace how an increasing understanding of the complexities of Hsp70-client interactions has led computational modeling to new underlying assumptions and design features. We describe the trend from purely data-driven analysis toward increased reliance on physics-based modeling that deeply integrates structural information and sequence-based functional data with physics-based binding energies. Notably, new experimental insights are adding to our understanding of the molecular origins of "selective promiscuity" in substrate binding by Hsp70 chaperones and challenging the underlying assumptions and design used in earlier predictive models. Taking the new experimental findings together with exciting progress in computational modeling of protein structures leads us to foresee a bright future for a predictive understanding of selective-yet-promiscuous binding exploited by Hsp70 molecular chaperones; the resulting new insights will also apply to substrate binding by other chaperones and by signaling proteins.
Collapse
Affiliation(s)
- Erik B. Nordquist
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts, 01003, United States
| | - Eugenia M. Clerico
- Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, Massachusetts, 01003, United States
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts, 01003, United States
- Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, Massachusetts, 01003, United States
| | - Lila M. Gierasch
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts, 01003, United States
- Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, Massachusetts, 01003, United States
| |
Collapse
|
5
|
Biskupek I, Czaplewski C, Sawicka J, Iłowska E, Dzierżyńska M, Rodziewicz-Motowidło S, Liwo A. Prediction of Aggregation of Biologically-Active Peptides with the UNRES Coarse-Grained Model. Biomolecules 2022; 12:1140. [PMID: 36009034 PMCID: PMC9406146 DOI: 10.3390/biom12081140] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/13/2022] [Accepted: 08/16/2022] [Indexed: 11/17/2022] Open
Abstract
The UNited RESidue (UNRES) model of polypeptide chains was applied to study the association of 20 peptides with sizes ranging from 6 to 32 amino-acid residues. Twelve of those were potentially aggregating hexa- or heptapeptides excised from larger proteins, while the remaining eight contained potentially aggregating sequences, functionalized by attaching larger ends rich in charged residues. For 13 peptides, the experimental data of aggregation were used. The remaining seven were synthesized, and their properties were measured in this work. Multiplexed replica-exchange simulations of eight-chain systems were conducted at 12 temperatures from 260 to 370 K at concentrations from 0.421 to 5.78 mM, corresponding to the experimental conditions. The temperature profiles of the fractions of monomers and octamers showed a clear transition corresponding to aggregate dissociation. Low simulated transition temperatures were obtained for the peptides, which did not precipitate after incubation, as well as for the H-GNNQQNY-NH2 prion-protein fragment, which forms small fibrils. A substantial amount of inter-strand β-sheets was found in most of the systems. The results suggest that UNRES simulations can be used to assess peptide aggregation except for glutamine- and asparagine-rich peptides, for which a revision of the UNRES sidechain-sidechain interaction potentials appears necessary.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
6
|
Charoenkwan P, Ahmed S, Nantasenamat C, Quinn JMW, Moni MA, Lio' P, Shoombuatong W. AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning. Sci Rep 2022; 12:7697. [PMID: 35546347 PMCID: PMC9095707 DOI: 10.1038/s41598-022-11897-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 05/03/2022] [Indexed: 12/13/2022] Open
Abstract
Amyloid proteins have the ability to form insoluble fibril aggregates that have important pathogenic effects in many tissues. Such amyloidoses are prominently associated with common diseases such as type 2 diabetes, Alzheimer's disease, and Parkinson's disease. There are many types of amyloid proteins, and some proteins that form amyloid aggregates when in a misfolded state. It is difficult to identify such amyloid proteins and their pathogenic properties, but a new and effective approach is by developing effective bioinformatics tools. While several machine learning (ML)-based models for in silico identification of amyloid proteins have been proposed, their predictive performance is limited. In this study, we present AMYPred-FRL, a novel meta-predictor that uses a feature representation learning approach to achieve more accurate amyloid protein identification. AMYPred-FRL combined six well-known ML algorithms (extremely randomized tree, extreme gradient boosting, k-nearest neighbor, logistic regression, random forest, and support vector machine) with ten different sequence-based feature descriptors to generate 60 probabilistic features (PFs), as opposed to state-of-the-art methods developed by a single feature-based approach. A logistic regression recursive feature elimination (LR-RFE) method was used to find the optimal m number of 60 PFs in order to improve the predictive performance. Finally, using the meta-predictor approach, the 20 selected PFs were fed into a logistic regression method to create the final hybrid model (AMYPred-FRL). Both cross-validation and independent tests showed that AMYPred-FRL achieved superior predictive performance than its constituent baseline models. In an extensive independent test, AMYPred-FRL outperformed the existing methods by 5.5% and 16.1%, respectively, with accuracy and MCC of 0.873 and 0.710. To expedite high-throughput prediction, a user-friendly web server of AMYPred-FRL is freely available at http://pmlabstack.pythonanywhere.com/AMYPred-FRL. It is anticipated that AMYPred-FRL will be a useful tool in helping researchers to identify new amyloid proteins.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Saeed Ahmed
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia
| | - Mohammad Ali Moni
- Artificial Intelligence and Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|