1
|
Manning MC, Holcomb RE, Payne RW, Stillahn JM, Connolly BD, Katayama DS, Liu H, Matsuura JE, Murphy BM, Henry CS, Crommelin DJA. Stability of Protein Pharmaceuticals: Recent Advances. Pharm Res 2024; 41:1301-1367. [PMID: 38937372 DOI: 10.1007/s11095-024-03726-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/03/2024] [Indexed: 06/29/2024]
Abstract
There have been significant advances in the formulation and stabilization of proteins in the liquid state over the past years since our previous review. Our mechanistic understanding of protein-excipient interactions has increased, allowing one to develop formulations in a more rational fashion. The field has moved towards more complex and challenging formulations, such as high concentration formulations to allow for subcutaneous administration and co-formulation. While much of the published work has focused on mAbs, the principles appear to apply to any therapeutic protein, although mAbs clearly have some distinctive features. In this review, we first discuss chemical degradation reactions. This is followed by a section on physical instability issues. Then, more specific topics are addressed: instability induced by interactions with interfaces, predictive methods for physical stability and interplay between chemical and physical instability. The final parts are devoted to discussions how all the above impacts (co-)formulation strategies, in particular for high protein concentration solutions.'
Collapse
Affiliation(s)
- Mark Cornell Manning
- Legacy BioDesign LLC, Johnstown, CO, USA.
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA.
| | - Ryan E Holcomb
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | - Robert W Payne
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | - Joshua M Stillahn
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | | | | | | | | | | | - Charles S Henry
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | | |
Collapse
|
2
|
Schepers J, Carter Z, Kritsiligkou P, Grant CM. Methionine Sulfoxide Reductases Suppress the Formation of the [ PSI+] Prion and Protein Aggregation in Yeast. Antioxidants (Basel) 2023; 12:antiox12020401. [PMID: 36829961 PMCID: PMC9952077 DOI: 10.3390/antiox12020401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 02/01/2023] [Accepted: 02/03/2023] [Indexed: 02/11/2023] Open
Abstract
Prions are self-propagating, misfolded forms of proteins associated with various neurodegenerative diseases in mammals and heritable traits in yeast. How prions form spontaneously into infectious amyloid-like structures without underlying genetic changes is poorly understood. Previous studies have suggested that methionine oxidation may underlie the switch from a soluble protein to the prion form. In this current study, we have examined the role of methionine sulfoxide reductases (MXRs) in protecting against de novo formation of the yeast [PSI+] prion, which is the amyloid form of the Sup35 translation termination factor. We show that [PSI+] formation is increased during normal and oxidative stress conditions in mutants lacking either one of the yeast MXRs (Mxr1, Mxr2), which protect against methionine oxidation by reducing the two epimers of methionine-S-sulfoxide. We have identified a methionine residue (Met124) in Sup35 that is important for prion formation, confirming that direct Sup35 oxidation causes [PSI+] prion formation. [PSI+] formation was less pronounced in mutants simultaneously lacking both MXR isoenzymes, and we show that the morphology and biophysical properties of protein aggregates are altered in this mutant. Taken together, our data indicate that methionine oxidation triggers spontaneous [PSI+] prion formation, which can be alleviated by methionine sulfoxide reductases.
Collapse
Affiliation(s)
- Jana Schepers
- Institute of Pathobiochemistry, University Medical Center of the Johannes Gutenberg University Mainz, Duesbergweg 6, 55099 Mainz, Germany
| | - Zorana Carter
- Division of Molecular and Cellular Function, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PT, UK
| | - Paraskevi Kritsiligkou
- Division of Redox Regulation, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Chris M. Grant
- Division of Molecular and Cellular Function, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PT, UK
- Correspondence:
| |
Collapse
|
3
|
Vincent MS, Ezraty B. Methionine oxidation in bacteria: A reversible post-translational modification. Mol Microbiol 2023; 119:143-150. [PMID: 36350090 DOI: 10.1111/mmi.15000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/04/2022] [Accepted: 11/05/2022] [Indexed: 11/10/2022]
Abstract
Methionine is a sulfur-containing residue found in most proteins which are particularly susceptible to oxidation. Although methionine oxidation causes protein damage, it can in some cases activate protein function. Enzymatic systems reducing oxidized methionine have evolved in most bacterial species and methionine oxidation proves to be a reversible post-translational modification regulating protein activity. In this review, we inspect recent examples of methionine oxidation provoking protein loss and gain of function. We further speculate on the role of methionine oxidation as a multilayer endogenous antioxidant system and consider its potential consequences for bacterial virulence.
Collapse
Affiliation(s)
- Maxence S Vincent
- Laboratoire de Chimie Bactérienne, Institut de Microbiologie de la Méditerranée, Aix-Marseille University, CNRS, Marseille, France
| | - Benjamin Ezraty
- Laboratoire de Chimie Bactérienne, Institut de Microbiologie de la Méditerranée, Aix-Marseille University, CNRS, Marseille, France
| |
Collapse
|
4
|
Aledo P, Aledo JC. Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices. Int J Mol Sci 2023; 24:ijms24010796. [PMID: 36614247 PMCID: PMC9821064 DOI: 10.3390/ijms24010796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/24/2022] [Accepted: 12/29/2022] [Indexed: 01/04/2023] Open
Abstract
The relative contribution of mutation and selection to the amino acid substitution rates observed in empirical matrices is unclear. Herein, we present a neutral continuous fitness-stability model, inspired by the Arrhenius law (qij=aije-ΔΔGij). The model postulates that the rate of amino acid substitution (i→j) is determined by the product of a pre-exponential factor, which is influenced by the genetic code structure, and an exponential term reflecting the relative fitness of the amino acid substitutions. To assess the validity of our model, we computed changes in stability of 14,094 proteins, for which 137,073,638 in silico mutants were analyzed. These site-specific data were summarized into a 20 square matrix, whose entries, ΔΔGij, were obtained after averaging through all the sites in all the proteins. We found a significant positive correlation between these energy values and the disease-causing potential of each substitution, suggesting that the exponential term accurately summarizes the fitness effect. A remarkable observation was that amino acids that were highly destabilizing when acting as the source, tended to have little effect when acting as the destination, and vice versa (source → destination). The Arrhenius model accurately reproduced the pattern of substitution rates collected in the empirical matrices, suggesting a relevant role for the genetic code structure and a tuning role for purifying selection exerted via protein stability.
Collapse
|
5
|
Suresh SA, Ethiraj S, Rajnish KN. A systematic review of recent trends in research on therapeutically significant L-asparaginase and acute lymphoblastic leukemia. Mol Biol Rep 2022; 49:11281-11287. [PMID: 35816224 DOI: 10.1007/s11033-022-07688-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 06/08/2022] [Indexed: 12/01/2022]
Abstract
L-asparaginases are mostly obtained from bacterial sources for their application in the therapy and food industry. Bacterial L-asparaginases are employed in the treatment of Acute Lymphoblastic Leukemia (ALL) and its subtypes, a type of blood and bone marrow cancer that results in the overproduction of immature blood cells. It also plays a role in the food industry in reducing the acrylamide formed during baking, roasting, and frying starchy foods. This importance of the enzyme makes it to be of constant interest to the researchers to isolate novel sources. Presently L-asparaginases from E. coli native and PEGylated form, Dickeya chrysanthemi (Erwinia chrysanthemi) are in the treatment regime. In therapy, the intrinsic glutaminase activity of the enzyme is a major drawback as the patients in treatment experience side effects like fever, skin rashes, anaphylaxis, pancreatitis, steatosis in the liver, and many complications. Its significance in the food industry in mitigating acrylamide is also a major reason. Acrylamide, a potent carcinogen was formed when treating starchy foods at higher temperatures. Acrylamide content in food was analyzed and pre-treatment was considered a valuable option. Immobilization of the enzyme is an advancing and promising technique in the effective delivery of the enzyme than in free form. The concept of machine learning by employing the Artificial Network and Genetic Algorithm has paved the way to optimize the production of L-asparaginase from its sources. Gene-editing tools are gaining momentum in the study of several diseases and this review focuses on the CRISPR-Cas9 gene-editing tool in ALL.
Collapse
Affiliation(s)
| | | | - K N Rajnish
- SRM Institute of Science and Technology, Chennai, Tamil Nadu, India.
| |
Collapse
|
6
|
Protein folding stabilities are a major determinant of oxidation rates for buried methionine residues. J Biol Chem 2022; 298:101872. [PMID: 35346688 PMCID: PMC9062257 DOI: 10.1016/j.jbc.2022.101872] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 03/19/2022] [Accepted: 03/21/2022] [Indexed: 12/20/2022] Open
Abstract
The oxidation of protein-bound methionines to form methionine sulfoxides has a broad range of biological ramifications, making it important to delineate factors that influence methionine oxidation rates within a given protein. This is especially important for biopharmaceuticals, where oxidation can lead to deactivation and degradation. Previously, neighboring residue effects and solvent accessibility have been shown to impact the susceptibility of methionine residues to oxidation. In this study, we provide proteome-wide evidence that oxidation rates of buried methionine residues are also strongly influenced by the thermodynamic folding stability of proteins. We surveyed the Escherichia coli proteome using several proteomic methodologies and globally measured oxidation rates of methionine residues in the presence and absence of tertiary structure, as well as the folding stabilities of methionine-containing domains. These data indicated that buried methionines have a wide range of protection factors against oxidation that correlate strongly with folding stabilities. Consistent with this, we show that in comparison to E. coli, the proteome of the thermophile Thermus thermophilus is significantly more stable and thus more resistant to methionine oxidation. To demonstrate the utility of this correlation, we used native methionine oxidation rates to survey the folding stabilities of E. coli and T. thermophilus proteomes at various temperatures and propose a model that relates the temperature dependence of the folding stabilities of these two species to their optimal growth temperatures. Overall, these results indicate that oxidation rates of buried methionines from the native state of proteins can be used as a metric of folding stability.
Collapse
|
7
|
A Transfer-Learning-Based Deep Convolutional Neural Network for Predicting Leukemia-Related Phosphorylation Sites from Protein Primary Sequences. Int J Mol Sci 2022; 23:ijms23031741. [PMID: 35163663 PMCID: PMC8915183 DOI: 10.3390/ijms23031741] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 01/27/2022] [Accepted: 01/29/2022] [Indexed: 12/27/2022] Open
Abstract
As one of the most important post-translational modifications (PTMs), phosphorylation refers to the binding of a phosphate group with amino acid residues like Ser (S), Thr (T) and Tyr (Y) thus resulting in diverse functions at the molecular level. Abnormal phosphorylation has been proved to be closely related with human diseases. To our knowledge, no research has been reported describing specific disease-associated phosphorylation sites prediction which is of great significance for comprehensive understanding of disease mechanism. In this work, focusing on three types of leukemia, we aim to develop a reliable leukemia-related phosphorylation site prediction models by combing deep convolutional neural network (CNN) with transfer-learning. CNN could automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of leukemia-related phosphorylation site prediction. With the largest dataset of myelogenous leukemia, the optimal models for S/T/Y phosphorylation sites give the AUC values of 0.8784, 0.8328 and 0.7716 respectively. When transferred learning on the small size datasets, the models for T-cell and lymphoid leukemia also give the promising performance by common sharing the optimal parameters. Compared with other five machine-learning methods, our CNN models reveal the superior performance. Finally, the leukemia-related pathogenesis analysis and distribution analysis on phosphorylated proteins along with K-means clustering analysis and position-specific conversation profiles on the phosphorylation site all indicate the strong practical feasibility of our easy-to-use CNN models.
Collapse
|
8
|
Cai Q, Yuan R, He J, Li M, Guo Y. Predicting HIV drug resistance using weighted machine learning method at target protein sequence-level. Mol Divers 2021; 25:1541-1551. [PMID: 34241771 DOI: 10.1007/s11030-021-10262-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 06/19/2021] [Indexed: 11/29/2022]
Abstract
Acquired immune deficiency syndrome (AIDS) is a fatal disease caused by human immunodeficiency virus (HIV). Although 23 different drugs have been available, the treatment of AIDS remains challenging because the virus mutates very quickly which can lead to drug resistance. Therefore, predicting drug resistance before treatment is crucial for individual treatments. Here, based on HIV target protein sequence information, we analyzed 21-drug resistance caused by mutated residues using machine learning (ML) methods. To transform target sequences into numeric vectors, seven physicochemical properties were used, which can well represent the interacting characteristics of target proteins. Then, principal component analysis (PCA) method was adopted to reduce the feature dimensionality. Random forest (RF) and support vector machine (SVM) based on three different kernel functions, including linear, polynomial and radial basis function (RBF), were all employed. By comparisons, we found that RBF-based SVM method gives a comparative performance with RF model. Further, we added the weight information to RBF-based SVM method by four different weight evaluation methods of RF, eXtreme Gradient Boosting (XGB), CfsSubsetEval and ReliefFAttributeEval, respectively. Results show that the RF-weighted RBF-based SVM yield the superior performance and 13 out of 21 drug models provide the correlation coefficients (R2) over 0.8 and 3 of them are higher than 0.9. Finally, position-specific importance analysis indicates that most of the mutation residues with high RF weight scores are proved to be closely related with drug resistance, which has been revealed in previous reports. Overall, we can expect that this method can be a supplementary tool for predicting HIV drug resistance for newly discovered mutations. Here, based on HIV target protein sequence information, we analyzed 21-drug resistance caused by mutated residues using machine learning (ML) methods by fusing the weight information of different mutation positions.
Collapse
Affiliation(s)
- Qihang Cai
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Rongao Yuan
- College of Computer Science, Sichuan University, Chengdu, 610064, China
| | - Jian He
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China.
| |
Collapse
|
9
|
Delmar JA, Buehler E, Chetty AK, Das A, Quesada GM, Wang J, Chen X. Machine learning prediction of methionine and tryptophan photooxidation susceptibility. MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT 2021; 21:466-477. [PMID: 33898635 PMCID: PMC8060516 DOI: 10.1016/j.omtm.2021.03.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 03/26/2021] [Indexed: 12/01/2022]
Abstract
Photooxidation of methionine (Met) and tryptophan (Trp) residues is common and includes major degradation pathways that often pose a serious threat to the success of therapeutic proteins. Oxidation impacts all steps of protein production, manufacturing, and shelf life. Prediction of oxidation liability as early as possible in development is important because many more candidate drugs are discovered than can be tested experimentally. Undetected oxidation liabilities necessitate expensive and time-consuming remediation strategies in development and may lead to good drugs reaching patients slowly. Conversely, sites mischaracterized as oxidation liabilities could result in overengineering and lead to good drugs never reaching patients. To our knowledge, no predictive model for photooxidation of Met or Trp is currently available. We applied the random forest machine learning algorithm to in-house liquid chromatography-tandem mass spectrometry (LC-MS/MS) datasets (Met, n = 421; Trp, n = 342) of tryptic therapeutic protein peptides to create computational models for Met and Trp photooxidation. We show that our machine learning models predict Met and Trp photooxidation likelihood with 0.926 and 0.860 area under the curve (AUC), respectively, and Met photooxidation rate with a correlation coefficient (Q2) of 0.511 and root-mean-square error (RMSE) of 10.9%. We further identify important physical, chemical, and formulation parameters that influence photooxidation. Improvement of biopharmaceutical liability predictions will result in better, more stable drugs, increasing development throughput, product quality, and likelihood of clinical success.
Collapse
Affiliation(s)
- Jared A Delmar
- Biopharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD 20878, USA
| | - Eugen Buehler
- Data Sciences and AI, R&D, AstraZeneca, Gaithersburg, MD 20878, USA
| | - Ashwin K Chetty
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Agastya Das
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | | | - Jihong Wang
- Biopharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD 20878, USA
| | - Xiaoyu Chen
- Biopharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD 20878, USA
| |
Collapse
|
10
|
Narayanan H, Dingfelder F, Butté A, Lorenzen N, Sokolov M, Arosio P. Machine Learning for Biologics: Opportunities for Protein Engineering, Developability, and Formulation. Trends Pharmacol Sci 2021; 42:151-165. [DOI: 10.1016/j.tips.2020.12.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 12/10/2020] [Accepted: 12/16/2020] [Indexed: 12/19/2022]
|
11
|
Ao C, Yu L, Zou Q. Prediction of bio-sequence modifications and the associations with diseases. Brief Funct Genomics 2020; 20:1-18. [PMID: 33313647 DOI: 10.1093/bfgp/elaa023] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 11/09/2020] [Accepted: 11/10/2020] [Indexed: 12/22/2022] Open
Abstract
Modifications of protein, RNA and DNA play an important role in many biological processes and are related to some diseases. Therefore, accurate identification and comprehensive understanding of protein, RNA and DNA modification sites can promote research on disease treatment and prevention. With the development of sequencing technology, the number of known sequences has continued to increase. In the past decade, many computational tools that can be used to predict protein, RNA and DNA modification sites have been developed. In this review, we comprehensively summarized the modification site predictors for three different biological sequences and the association with diseases. The relevant web server is accessible at http://lab.malab.cn/∼acy/PTM_data/ some sample data on protein, RNA and DNA modification can be downloaded from that website.
Collapse
|
12
|
Kamerzell TJ, Middaugh CR. Prediction Machines: Applied Machine Learning for Therapeutic Protein Design and Development. J Pharm Sci 2020; 110:665-681. [PMID: 33278409 DOI: 10.1016/j.xphs.2020.11.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 11/27/2020] [Accepted: 11/27/2020] [Indexed: 12/11/2022]
Abstract
The rapid growth in technological advances and quantity of scientific data over the past decade has led to several challenges including data storage and analysis. Accurate models of complex datasets were previously difficult to develop and interpret. However, improvements in machine learning algorithms have since enabled unparalleled classification and prediction capabilities. The application of machine learning can be seen throughout diverse industries due to their ease of use and interpretability. In this review, we describe popular machine learning algorithms and highlight their application in pharmaceutical protein development. Machine learning models have now been applied to better understand the nonlinear concentration dependent viscosity of protein solutions, predict protein oxidation and deamidation rates, classify sub-visible particles and compare the physical stability of proteins. We also applied several machine learning algorithms using previously published data and describe models with improved predictions and classification. The authors hope that this review can be used as a resource to others and encourage continued application of machine learning algorithms to problems in pharmaceutical protein development.
Collapse
Affiliation(s)
- Tim J Kamerzell
- Department of Pharmaceutical Chemistry, The University of Kansas, Lawrence, KS, USA; Division of Internal Medicine, HCA MidWest Health, Overland Park, KS, USA.
| | - C Russell Middaugh
- Department of Pharmaceutical Chemistry, The University of Kansas, Lawrence, KS, USA
| |
Collapse
|
13
|
Aledo JC, Aledo P. Susceptibility of Protein Methionine Oxidation in Response to Hydrogen Peroxide Treatment-Ex Vivo Versus In Vitro: A Computational Insight. Antioxidants (Basel) 2020; 9:antiox9100987. [PMID: 33066324 PMCID: PMC7602125 DOI: 10.3390/antiox9100987] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/08/2020] [Accepted: 10/09/2020] [Indexed: 11/25/2022] Open
Abstract
Methionine oxidation plays a relevant role in cell signaling. Recently, we built a database containing thousands of proteins identified as sulfoxidation targets. Using this resource, we have now developed a computational approach aimed at characterizing the oxidation of human methionyl residues. We found that proteins oxidized in both cell-free preparations (in vitro) and inside living cells (ex vivo) were enriched in methionines and intrinsically disordered regions. However, proteins oxidized ex vivo tended to be larger and less abundant than those oxidized in vitro. Another distinctive feature was their subcellular localizations. Thus, nuclear and mitochondrial proteins were preferentially oxidized ex vivo but not in vitro. The nodes corresponding with ex vivo and in vitro oxidized proteins in a network based on gene ontology terms showed an assortative mixing suggesting that ex vivo oxidized proteins shared among them molecular functions and biological processes. This was further supported by the observation that proteins from the ex vivo set were co-regulated more often than expected by chance. We also investigated the sequence environment of oxidation sites. Glutamate and aspartate were overrepresented in these environments regardless the group. In contrast, tyrosine, tryptophan and histidine were clearly avoided but only in the environments of the ex vivo sites. A hypothetical mechanism of methionine oxidation accounts for these observations presented.
Collapse
|
14
|
Kuroda D, Tsumoto K. Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design. J Pharm Sci 2020; 109:1631-1651. [DOI: 10.1016/j.xphs.2020.01.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 12/25/2019] [Accepted: 01/10/2020] [Indexed: 12/18/2022]
|
15
|
Veredas FJ, Urda D, Subirats JL, Cantón FR, Aledo JC. Combining feature engineering and feature selection to improve the prediction of methionine oxidation sites in proteins. Neural Comput Appl 2020. [DOI: 10.1007/s00521-018-3655-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Delmar JA, Wang J, Choi SW, Martins JA, Mikhail JP. Machine Learning Enables Accurate Prediction of Asparagine Deamidation Probability and Rate. MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT 2019; 15:264-274. [PMID: 31890727 PMCID: PMC6923510 DOI: 10.1016/j.omtm.2019.09.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Accepted: 09/16/2019] [Indexed: 12/20/2022]
Abstract
The spontaneous conversion of asparagine residues to aspartic acid or iso-aspartic acid, via deamidation, is a major pathway of protein degradation and is often seriously disruptive to biological systems. Deamidation has been shown to negatively affect both in vitro stability and in vivo biological function of diverse classes of proteins. During protein therapeutics development, deamidation liabilities that are overlooked necessitate expensive and time-consuming remediation strategies, sometimes leading to termination of the project. In this paper, we apply machine learning to a large (n = 776) liquid chromatography-tandem mass spectrometry (LC-MS/MS) dataset of monoclonal antibody peptides to create computational models for the post-translational modification asparagine deamidation, using the random decision forest method. We show that our categorical model predicts antibody deamidation with nearly 5% increased accuracy and 0.2 MCC over the best currently available models. Surprisingly, our model also paces or outperforms advanced and conventional models on an independent non-antibody dataset. In addition to deamidation probability, we are able to accurately predict deamidation rate (R2 = 0.963 and Q2 = 0.822), a capability with no peer in current models. This method should enable significant improvement in protein candidate selection, especially in biopharmaceutical development, and can be applied with similar accuracy to enzymes, monoclonal antibodies, next-generation formats, vaccine component antigens, and gene therapy vectors such as adeno-associated virus.
Collapse
Affiliation(s)
- Jared A Delmar
- Analytical Sciences, Biopharmaceutical Development, AstraZeneca, One MedImmune Way, Gaithersburg, MD 20878, USA
| | - Jihong Wang
- Analytical Sciences, Biopharmaceutical Development, AstraZeneca, One MedImmune Way, Gaithersburg, MD 20878, USA
| | - Seo Woo Choi
- David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jason A Martins
- David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - John P Mikhail
- David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
17
|
Liu Y, Guo Y, Wu W, Xiong Y, Sun C, Yuan L, Li M. A Machine Learning-Based QSAR Model for Benzimidazole Derivatives as Corrosion Inhibitors by Incorporating Comprehensive Feature Selection. Interdiscip Sci 2019; 11:738-747. [PMID: 31486019 DOI: 10.1007/s12539-019-00346-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 07/23/2019] [Accepted: 07/25/2019] [Indexed: 01/28/2023]
Abstract
BACKGROUND Computational prediction of inhibition efficiency (IE) for inhibitor molecules is a crucial supplementary way to design novel molecules that can efficiently inhibit corrosion onto metallic surfaces. PURPOSE Here we are dedicated to developing a new machine learning-based predictor for the inhibition efficiency (IE) of benzimidazole derivatives. METHODS First, a comprehensively numerical representation was given on inhibitor molecules from all aspects of energy, electronic, topological, physicochemical and spatial properties based on 3-D structures and 150 valid structural descriptors were obtained. Then, a thorough investigation of these structural descriptors was implemented. The multicollinearity-based clustering analysis was performed to remove the linear correlated feature variables, so 47 feature clusters were produced. Meanwhile, Gini importance by random forest (RF) was used to further measure the contributions of the descriptors in each cluster and 47 non-linear descriptors were selected with the highest Gini importance score in the corresponding cluster. Further, considering the limited number of available inhibitors, different feature subsets were constructed according to the Gini importance score ranking list of 47 descriptors. RESULTS Finally, support vector machine (SVM) models based on different feature subsets were tested by leave-one-out cross validation. Through comparisons, the optimal SVM model with the top 11 descriptors was achieved based on Poly kernel. This model yields a promising performance with the correlation coefficient (R) and root-mean-square error (RMSE) of 0.9589 and 4.45, respectively, which indicates that the method proposed by us gives the best performance for the current data. CONCLUSION Based on our model, 6 new benzimidazole molecules were designed and their IE values predicted by this model indicate that two of them have high potential as outstanding corrosion inhibitors.
Collapse
Affiliation(s)
- Youquan Liu
- Research Institute of Natural Gas Technology, Petro China Southwest Oil and Gas Field Company, Chengdu, 610213, China.
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, Sichuan, 610064, People's Republic of China.
| | - Wengang Wu
- Research Institute of Natural Gas Technology, Petro China Southwest Oil and Gas Field Company, Chengdu, 610213, China
| | - Ying Xiong
- Research Institute of Natural Gas Technology, Petro China Southwest Oil and Gas Field Company, Chengdu, 610213, China
| | - Chuan Sun
- Research Institute of Natural Gas Technology, Petro China Southwest Oil and Gas Field Company, Chengdu, 610213, China
| | - Li Yuan
- Research Institute of Natural Gas Technology, Petro China Southwest Oil and Gas Field Company, Chengdu, 610213, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, 610064, People's Republic of China
| |
Collapse
|
18
|
Predicting the decision making chemicals used for bacterial growth. Sci Rep 2019; 9:7251. [PMID: 31076576 PMCID: PMC6510730 DOI: 10.1038/s41598-019-43587-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 04/24/2019] [Indexed: 01/01/2023] Open
Abstract
Predicting the contribution of media components to bacterial growth was first initiated by introducing machine learning to high-throughput growth assays. A total of 1336 temporal growth records corresponding to 225 different media, which were composed of 13 chemical components, were generated. The growth rate and saturated density of each growth curve were automatically calculated with the newly developed data processing program. To identify the decision making factors related to growth among the 13 chemicals, big datasets linking the growth parameters to the chemical combinations were subjected to decision tree learning. The results showed that the only carbon source, glucose, determined bacterial growth, but it was not the first priority. Instead, the top decision making chemicals in relation to the growth rate and saturated density were ammonium and ferric ions, respectively. Three chemical components (NH4+, Mg2+ and glucose) commonly appeared in the decision trees of the growth rate and saturated density, but they exhibited different mechanisms. The concentration ranges for fast growth and high density were overlapped for glucose but distinguished for NH4+ and Mg2+. The results suggested that these chemicals were crucial in determining the growth speed and growth maximum in either a universal use or a trade-off manner. This differentiation might reflect the diversity in the resource allocation mechanisms for growth priority depending on the environmental restrictions. This study provides a representative example for clarifying the contribution of the environment to population dynamics through an innovative viewpoint of employing modern data science within traditional microbiology to obtain novel findings.
Collapse
|
19
|
Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel) 2019; 10:E87. [PMID: 30696086 PMCID: PMC6410075 DOI: 10.3390/genes10020087] [Citation(s) in RCA: 153] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/08/2019] [Accepted: 01/21/2019] [Indexed: 12/11/2022] Open
Abstract
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.
Collapse
Affiliation(s)
- Bilal Mirza
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Wei Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Jie Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Howard Choi
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Neo Christopher Chung
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
| | - Peipei Ping
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Medicine (Cardiology), University of California Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
20
|
Sankar K, Hoi KH, Yin Y, Ramachandran P, Andersen N, Hilderbrand A, McDonald P, Spiess C, Zhang Q. Prediction of methionine oxidation risk in monoclonal antibodies using a machine learning method. MAbs 2018; 10:1281-1290. [PMID: 30252602 PMCID: PMC6284603 DOI: 10.1080/19420862.2018.1518887] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Revised: 08/15/2018] [Accepted: 08/28/2018] [Indexed: 12/22/2022] Open
Abstract
Monoclonal antibodies (mAbs) have become a major class of protein therapeutics that target a spectrum of diseases ranging from cancers to infectious diseases. Similar to any protein molecule, mAbs are susceptible to chemical modifications during the manufacturing process, long-term storage, and in vivo circulation that can impair their potency. One such modification is the oxidation of methionine residues. Chemical modifications that occur in the complementarity-determining regions (CDRs) of mAbs can lead to the abrogation of antigen binding and reduce the drug's potency and efficacy. Thus, it is highly desirable to identify and eliminate any chemically unstable residues in the CDRs during the therapeutic antibody discovery process. To provide increased throughput over experimental methods, we extracted features from the mAbs' sequences, structures, and dynamics, used random forests to identify important features and develop a quantitative and highly predictive in silico methionine oxidation model.
Collapse
Affiliation(s)
- Kannan Sankar
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
| | - Kam Hon Hoi
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
- Department of Bioinformatics and Computational Biology, Genentech, South San Francisco, CA, USA
| | - Yizhou Yin
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
- Institute for Bioscience and Biotechnology Research, Biological Sciences Graduate Program, University of Maryland, Rockville, MD, USA
| | - Prasanna Ramachandran
- Department of Analytical Development and Quality Control, Genentech, South San Francisco, CA, USA
| | - Nisana Andersen
- Department of Analytical Development and Quality Control, Genentech, South San Francisco, CA, USA
| | - Amy Hilderbrand
- Department of Analytical Development and Quality Control, Genentech, South San Francisco, CA, USA
| | - Paul McDonald
- Department of Purification Development and Bioprocess Development, Genentech, South San Francisco, CA, USA
| | - Christoph Spiess
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
| | - Qing Zhang
- Department of Antibody Engineering, Genentech, South San Francisco, CA, USA
- Department of Bioinformatics and Computational Biology, Genentech, South San Francisco, CA, USA
| |
Collapse
|
21
|
Manning MC, Liu J, Li T, Holcomb RE. Rational Design of Liquid Formulations of Proteins. THERAPEUTIC PROTEINS AND PEPTIDES 2018; 112:1-59. [DOI: 10.1016/bs.apcsb.2018.01.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|