1
|
Zhou Y, Liu W, Luo C, Huang Z, Samarappuli Mudiyanselage Savini G, Zhao L, Wang R, Huang J. Ab-Amy 2.0: Predicting light chain amyloidogenic risk of therapeutic antibodies based on antibody language model. Methods 2025; 233:11-18. [PMID: 39550021 DOI: 10.1016/j.ymeth.2024.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 10/28/2024] [Accepted: 11/06/2024] [Indexed: 11/18/2024] Open
Abstract
Therapeutic antibodies have emerged as a promising treatment option for a wide range of diseases. However, the light chain of antibodies can potentially induce amyloidosis, a condition characterized by protein misfolding and aggregation, posing a significant safety concern. Therefore, it is crucial to assess the amyloidogenic risk of therapeutic antibodies during the early stages of drug development. In this study, we introduce AB-Amy 2.0, a new computational model with enhanced performance for assessing the light chain amyloidogenic risk of therapeutic antibodies. By employing pretrained protein language models (PLMs) embeddings, AB-Amy 2.0 achieves higher accuracy in amyloidogenic risk prediction compared with traditional features offering a crucial tool for early-stage identification of antibodies with low aggregation propensity. The AB-Amy 2.0 was trained on antiBERTy embeddings and utilizes the SVM algorithm, resulting in superior performance metrics. On an independent test dataset, the model achieved high sensitivity, specificity, ACC, MCC and AUC of 93.47%, 89.23%, 91.92%, 0.8261 and 0.9739, respectively. These results highlight the effectiveness and robustness of AB-Amy 2.0 in predicting light chain amyloidogenic risk accurately. To facilitate user-friendly access, we have developed an online web server (http://i.uestc.edu.cn/AB-Amy2) and a command line tool (https://github.com/zzyywww/ABAmy2). These resources enable the broader application of this advanced model and promise to enhance the development of safer therapeutic antibodies.
Collapse
Affiliation(s)
- Yuwei Zhou
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Wenwen Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Chunmei Luo
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611731, China
| | - Ziru Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | | | - Lening Zhao
- Yingcai Honors College, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Rong Wang
- Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China; School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611731, China.
| |
Collapse
|
2
|
Ahammed MR, Ananya FN. Cardiac Amyloidosis: A Comprehensive Review of Pathophysiology, Diagnostic Approach, Applications of Artificial Intelligence, and Management Strategies. Cureus 2024; 16:e63673. [PMID: 39092395 PMCID: PMC11293487 DOI: 10.7759/cureus.63673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/02/2024] [Indexed: 08/04/2024] Open
Abstract
Cardiac amyloidosis (CA) is a serious and often fatal condition caused by the accumulation of amyloid fibrils in the heart, leading to progressive heart failure. It involves the misfolding of normally soluble proteins into insoluble amyloid fibrils, with transthyretin and light-chain amyloidosis being the most common forms affecting the heart. Advances in diagnostics, especially cardiac magnetic resonance imaging and non-invasive techniques, have improved early detection and disease management. Artificial intelligence has emerged as a diagnostic tool for cardiac amyloidosis, improving accuracy and enabling earlier intervention through advanced imaging analysis and pattern recognition. Management strategies include volume control, specific pharmacotherapies like tafamidis, and addressing arrhythmias and advanced heart failure. However, further research is needed for novel therapeutic approaches, the long-term effectiveness of emerging treatments, and the optimization of artificial intelligence applications in clinical practice for better patient outcomes. The article aims to provide an overview of CA, outlining its pathophysiology, diagnostic advancements, the role of artificial intelligence, management strategies, and the need for further research.
Collapse
Affiliation(s)
- Md Ripon Ahammed
- Internal Medicine, Icahn School of Medicine at Mount Sinai/New York City Health and Hospitals Queens, New York City, USA
| | | |
Collapse
|
3
|
Kamel MA, Abbas MT, Kanaan CN, Awad KA, Baba Ali N, Scalia IG, Farina JM, Pereyra M, Mahmoud AK, Steidley DE, Rosenthal JL, Ayoub C, Arsanjani R. How Artificial Intelligence Can Enhance the Diagnosis of Cardiac Amyloidosis: A Review of Recent Advances and Challenges. J Cardiovasc Dev Dis 2024; 11:118. [PMID: 38667736 PMCID: PMC11050851 DOI: 10.3390/jcdd11040118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/09/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Cardiac amyloidosis (CA) is an underdiagnosed form of infiltrative cardiomyopathy caused by abnormal amyloid fibrils deposited extracellularly in the myocardium and cardiac structures. There can be high variability in its clinical manifestations, and diagnosing CA requires expertise and often thorough evaluation; as such, the diagnosis of CA can be challenging and is often delayed. The application of artificial intelligence (AI) to different diagnostic modalities is rapidly expanding and transforming cardiovascular medicine. Advanced AI methods such as deep-learning convolutional neural networks (CNNs) may enhance the diagnostic process for CA by identifying patients at higher risk and potentially expediting the diagnosis of CA. In this review, we summarize the current state of AI applications to different diagnostic modalities used for the evaluation of CA, including their diagnostic and prognostic potential, and current challenges and limitations.
Collapse
Affiliation(s)
- Moaz A. Kamel
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | | | | | - Kamal A. Awad
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Nima Baba Ali
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Isabel G. Scalia
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Juan M. Farina
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Milagros Pereyra
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Ahmed K. Mahmoud
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | - D. Eric Steidley
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Julie L. Rosenthal
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Chadi Ayoub
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
- Division of Cardiovascular Imaging, Mayo Clinic, 5777 East Mayo Boulevard, Phoenix, AZ 85054, USA
| | - Reza Arsanjani
- Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA
- Division of Cardiovascular Imaging, Mayo Clinic, 5777 East Mayo Boulevard, Phoenix, AZ 85054, USA
| |
Collapse
|
4
|
Zhou Y, Huang Z, Gou Y, Liu S, Yang W, Zhang H, Dzisoo AM, Huang J. AB-Amy: machine learning aided amyloidogenic risk prediction of therapeutic antibody light chains. Antib Ther 2023; 6:147-156. [PMID: 37492587 PMCID: PMC10365155 DOI: 10.1093/abt/tbad007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 03/30/2023] [Accepted: 04/06/2023] [Indexed: 07/27/2023] Open
Abstract
Over 120 FDA-approved antibody-based therapeutics are used to treat a variety of diseases.However, many candidates could fail because of unfavorable physicochemical properties. Light-chain amyloidosis is one form of aggregation that can lead to severe safety risks in clinical development. Therefore, screening candidates with a less amyloidosis risk at the early stage can not only save the time and cost of antibody development but also improve the safety of antibody drugs. In this study, based on the dipeptide composition of 742 amyloidogenic and 712 non-amyloidogenic antibody light chains, a support vector machine-based model, AB-Amy, was trained to predict the light-chain amyloidogenic risk. The AUC of AB-Amy reaches 0.9651. The excellent performance of AB-Amy indicates that it can be a useful tool for the in silico evaluation of the light-chain amyloidogenic risk to ensure the safety of antibody therapeutics under clinical development. A web server is freely available at http://i.uestc.edu.cn/AB-Amy/.
Collapse
Affiliation(s)
- Yuwei Zhou
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Ziru Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Yushu Gou
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Siqi Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Wei Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Hongyu Zhang
- Research and Development, Zhanyuan Therapeutics Ltd., Hangzhou, Zhejiang 310000, China
| | - Anthony Mackitz Dzisoo
- Bioinformatics, Data and Medical Reporting, Arcencsus GmbH, Rostock, Mecklenburg-Vorpommern 18055, Germany
| | - Jian Huang
- To whom correspondence should be addressed. Jian Huang, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu 610054, China.
| |
Collapse
|
5
|
Machine Learning Approaches in Diagnosis, Prognosis and Treatment Selection of Cardiac Amyloidosis. Int J Mol Sci 2023; 24:ijms24065680. [PMID: 36982754 PMCID: PMC10051237 DOI: 10.3390/ijms24065680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/12/2023] [Accepted: 03/14/2023] [Indexed: 03/18/2023] Open
Abstract
Cardiac amyloidosis is an uncommon restrictive cardiomyopathy featuring an unregulated amyloid protein deposition that impairs organic function. Early cardiac amyloidosis diagnosis is generally delayed by indistinguishable clinical findings of more frequent hypertrophic diseases. Furthermore, amyloidosis is divided into various groups, according to a generally accepted taxonomy, based on the proteins that make up the amyloid deposits; a careful differentiation between the various forms of amyloidosis is necessary to undertake an adequate therapeutic treatment. Thus, cardiac amyloidosis is thought to be underdiagnosed, which delays necessary therapeutic procedures, diminishing quality of life and impairing clinical prognosis. The diagnostic work-up for cardiac amyloidosis begins with the identification of clinical features, electrocardiographic and imaging findings suggestive or compatible with cardiac amyloidosis, and often requires the histological demonstration of amyloid deposition. One approach to overcome the difficulty of an early diagnosis is the use of automated diagnostic algorithms. Machine learning enables the automatic extraction of salient information from “raw data” without the need for pre-processing methods based on the a priori knowledge of the human operator. This review attempts to assess the various diagnostic approaches and artificial intelligence computational techniques in the detection of cardiac amyloidosis.
Collapse
|
6
|
Lai PK, Gallegos A, Mody N, Sathish HA, Trout BL. Machine learning prediction of antibody aggregation and viscosity for high concentration formulation development of protein therapeutics. MAbs 2022; 14:2026208. [PMID: 35075980 PMCID: PMC8794240 DOI: 10.1080/19420862.2022.2026208] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Machine learning has been recently used to predict therapeutic antibody aggregation rates and viscosity at high concentrations (150 mg/ml). These works focused on commercially available antibodies, which may have been optimized for stability. In this study, we measured accelerated aggregation rates at 45°C and viscosity at 150 mg/ml for 20 preclinical and clinical-stage antibodies. Features obtained from molecular dynamics simulations of the full-length antibody and sequences were used for machine learning model construction. We found a k-nearest neighbors regression model with two features, spatial positive charge map on the CDRH2 and solvent-accessible surface area of hydrophobic residues on the variable fragment, gives the best performance for predicting antibody aggregation rates (r = 0.89). For the viscosity classification model, the model with the highest accuracy is a logistic regression model with two features, spatial negative charge map on the heavy chain variable region and spatial negative charge map on the light chain variable region. The accuracy and the area under precision recall curve of the classification model from validation tests are 0.86 and 0.70, respectively. In addition, we combined data from another 27 commercial mAbs to develop a viscosity predictive model. The best model is a logistic regression model with two features, number of hydrophobic residues on the light chain variable region and net charges on the light chain variable region. The accuracy and the area under precision recall curve of the classification model are 0.85 and 0.6, respectively. The aggregation rates and viscosity models can be used to predict antibody stability to facilitate pharmaceutical development.
Collapse
Affiliation(s)
- Pin-Kuang Lai
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.,Department of Chemical Engineering and Materials Science, Stevens Institute of Technology, Hoboken, New Jersey, USA
| | - Austin Gallegos
- Dosage Form Design and Development, AstraZeneca, Gaithersburg, Maryland, USA
| | - Neil Mody
- Dosage Form Design and Development, AstraZeneca, Gaithersburg, Maryland, USA
| | - Hasige A Sathish
- Dosage Form Design and Development, AstraZeneca, Gaithersburg, Maryland, USA
| | - Bernhardt L Trout
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
7
|
Rawat P, Prabakaran R, Kumar S, Gromiha MM. Exploring the sequence features determining amyloidosis in human antibody light chains. Sci Rep 2021; 11:13785. [PMID: 34215782 PMCID: PMC8253744 DOI: 10.1038/s41598-021-93019-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 06/18/2021] [Indexed: 02/06/2023] Open
Abstract
The light chain (AL) amyloidosis is caused by the aggregation of light chain of antibodies into amyloid fibrils. There are plenty of computational resources available for the prediction of short aggregation-prone regions within proteins. However, it is still a challenging task to predict the amyloidogenic nature of the whole protein using sequence/structure information. In the case of antibody light chains, common architecture and known binding sites can provide vital information for the prediction of amyloidogenicity at physiological conditions. Here, in this work, we have compared classical sequence-based, aggregation-related features (such as hydrophobicity, presence of gatekeeper residues, disorderness, β-propensity, etc.) calculated for the CDR, FR or VL regions of amyloidogenic and non-amyloidogenic antibody light chains and implemented the insights gained in a machine learning-based webserver called "VLAmY-Pred" ( https://web.iitm.ac.in/bioinfo2/vlamy-pred/ ). The model shows prediction accuracy of 79.7% (sensitivity: 78.7% and specificity: 79.9%) with a ROC value of 0.88 on a dataset of 1828 variable region sequences of the antibody light chains. This model will be helpful towards improved prognosis for patients that may likely suffer from diseases caused by light chain amyloidosis, understanding origins of aggregation in antibody-based biotherapeutics, large-scale in-silico analysis of antibody sequences generated by next generation sequencing, and finally towards rational engineering of aggregation resistant antibodies.
Collapse
Affiliation(s)
- Puneet Rawat
- grid.417969.40000 0001 2315 1926Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036 Tamil Nadu India
| | - R. Prabakaran
- grid.417969.40000 0001 2315 1926Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036 Tamil Nadu India
| | - Sandeep Kumar
- grid.418412.a0000 0001 1312 9717Biotherapeutics Discovery, Boehringer-Ingelheim Inc., 5571 R & D Building, 175 Briar Ridge Road, Ridgefield, CT 06877 USA
| | - M. Michael Gromiha
- grid.417969.40000 0001 2315 1926Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036 Tamil Nadu India ,grid.32197.3e0000 0001 2179 2105Advanced Computational Drug Discovery Unit (ACDD), Institute of Innovative Research, Tokyo Institute of Technology, 4259 Nagatsutacho, Midori-ku, Yokohama, Kanagawa 226-8501 Japan
| |
Collapse
|
8
|
Li Y, Zhang Z, Teng Z, Liu X. PredAmyl-MLP: Prediction of Amyloid Proteins Using Multilayer Perceptron. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8845133. [PMID: 33294004 PMCID: PMC7700051 DOI: 10.1155/2020/8845133] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/06/2020] [Accepted: 10/31/2020] [Indexed: 01/20/2023]
Abstract
Amyloid is generally an aggregate of insoluble fibrin; its abnormal deposition is the pathogenic mechanism of various diseases, such as Alzheimer's disease and type II diabetes. Therefore, accurately identifying amyloid is necessary to understand its role in pathology. We proposed a machine learning-based prediction model called PredAmyl-MLP, which consists of the following three steps: feature extraction, feature selection, and classification. In the step of feature extraction, seven feature extraction algorithms and different combinations of them are investigated, and the combination of SVMProt-188D and tripeptide composition (TPC) is selected according to the experimental results. In the step of feature selection, maximum relevant maximum distance (MRMD) and binomial distribution (BD) are, respectively, used to remove the redundant or noise features, and the appropriate features are selected according to the experimental results. In the step of classification, we employed multilayer perceptron (MLP) to train the prediction model. The 10-fold cross-validation results show that the overall accuracy of PredAmyl-MLP reached 91.59%, and the performance was better than the existing methods.
Collapse
Affiliation(s)
- Yanjuan Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Zitong Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Xiaoyan Liu
- College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150040, China
| |
Collapse
|
9
|
Charoenkwan P, Kanthawong S, Nantasenamat C, Hasan MM, Shoombuatong W. iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides. Genomics 2020; 113:689-698. [PMID: 33017626 DOI: 10.1016/j.ygeno.2020.09.065] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 09/21/2020] [Accepted: 09/30/2020] [Indexed: 01/09/2023]
Abstract
Fast, accurate identification and characterization of amyloid proteins at a large-scale is essential for understating their role in therapeutic intervention strategies. As a matter of fact, there exist only one in silico model for amyloid protein identification using the random forest (RF) model in conjunction with various feature types namely the RFAmy. However, it suffers from low interpretability for biologists. Thus, it is highly desirable to develop a simple and easily interpretable prediction method with robust accuracy as compared to the existing complicated model. In this study, we propose iAMY-SCM, the first scoring card method-based predictor for predicting and analyzing amyloid proteins. Herein, the iAMY-SCM made use of a simple weighted-sum function in conjunction with the propensity scores of dipeptides for the amyloid protein identification. Cross-validation results indicated that iAMY-SCM provided an accuracy of 0.895 that corresponded to 10-22% higher performance than that of widely used machine learning models. Furthermore, iAMY-SCM achieving an accuracy of 0.827 as evaluated by an independent test, which was found to be comparable to that of RFAmy and was approximately 9-13% higher than widely used machine learning models. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide insights into the biophysical and biochemical properties of amyloid proteins. As such, this demonstrates that the proposed iAMY-SCM is efficient and reliable in terms of simplicity, interpretability and implementation. To facilitate ease of use of the proposed iAMY-SCM, a user-friendly and publicly accessible web server at http://camt.pythonanywhere.com/iAMY-SCM has been established. We anticipate that that iAMY-SCM will be an important tool for facilitating the large-scale prediction and characterization of amyloid protein.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
10
|
Saravanan KM, Zhang H, Zhang H, Xi W, Wei Y. On the Conformational Dynamics of β-Amyloid Forming Peptides: A Computational Perspective. Front Bioeng Biotechnol 2020; 8:532. [PMID: 32656188 PMCID: PMC7325929 DOI: 10.3389/fbioe.2020.00532] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 05/04/2020] [Indexed: 12/12/2022] Open
Abstract
Understanding the conformational dynamics of proteins and peptides involved in important functions is still a difficult task in computational structural biology. Because such conformational transitions in β-amyloid (Aβ) forming peptides play a crucial role in many neurological disorders, researchers from different scientific fields have been trying to address issues related to the folding of Aβ forming peptides together. Many theoretical models have been proposed in the recent years for studying Aβ peptides using mathematical, physicochemical, and molecular dynamics simulation, and machine learning approaches. In this article, we have comprehensively reviewed the developmental advances in the theoretical models for Aβ peptide folding and interactions, particularly in the context of neurological disorders. Furthermore, we have extensively reviewed the advances in molecular dynamics simulation as a tool used for studying the conversions between polymorphic amyloid forms and applications of using machine learning approaches in predicting Aβ peptides and aggregation-prone regions in proteins. We have also provided details on the theoretical advances in the study of Aβ peptides, which would enhance our understanding of these peptides at the molecular level and eventually lead to the development of targeted therapies for certain acute neurological disorders such as Alzheimer's disease in the future.
Collapse
Affiliation(s)
| | | | | | - Wenhui Xi
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
11
|
Kuroda D, Tsumoto K. Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design. J Pharm Sci 2020; 109:1631-1651. [DOI: 10.1016/j.xphs.2020.01.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 12/25/2019] [Accepted: 01/10/2020] [Indexed: 12/18/2022]
|
12
|
Identification of amyloidogenic peptides via optimized integrated features space based on physicochemical properties and PSSM. Anal Biochem 2019; 583:113362. [PMID: 31310738 DOI: 10.1016/j.ab.2019.113362] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 07/09/2019] [Accepted: 07/12/2019] [Indexed: 01/08/2023]
Abstract
At present, the identification of amyloid becomes more and more essential and meaningful. Because its mis-aggregation may cause some diseases such as Alzheimer's and Parkinson's diseases. This paper focus on the classification of amyloidogenic peptides and a novel feature representation called PhyAve_PSSMDwt is proposed. It includes two parts. One is based on physicochemical properties involving hydrophilicity, hydrophobicity, aggregation tendency, packing density and H-bonding which extracts 15-dimensional features in total. And the other is 60-dimensional features through recursive feature elimination from PSSM by discrete wavelet transform. In this period, sliding window is introduced to reconstruct PSSM so that the evolutionary information of short sequences can still be extracted. At last, the support vector machine is adopted as a classifier. The experimental result on Pep424 dataset shows that PSSM's information makes a great contribution on performance. And compared with other existing methods, our results after cross-validation increase by 3.1%, 3.3%, 0.136 and 0.007 in accuracy, specificity, Matthew's correlation coefficient and AUC value, respectively. It indicates that our method is effective and competitive.
Collapse
|
13
|
Upadhyay A. Structure of proteins: Evolution with unsolved mysteries. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 149:160-172. [PMID: 31014967 DOI: 10.1016/j.pbiomolbio.2019.04.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 04/16/2019] [Accepted: 04/19/2019] [Indexed: 02/07/2023]
Abstract
Evolution of macromolecules could be considered as a milestone in the history of life. Nucleic acids are the long stretches of nucleotides that contain all the possible codes and information of life. On the other hand, proteins are their actual translated outcomes, or reflections of modifications in their structure that have occurred at a slow, but steady rate over a very long period of evolution. Over the years of research, biophysicists, biochemists, molecular and structural biologists have unfurled several layers of the structural convolutions in these chemical molecules; however evolutionists look over their structures through a different prism, which may or may not coincide with others. There remains a need to outline several well-known, but less discussed features of protein structures, like intrinsically disordered states, degron signals and different types of ubiquitin chains providing degradation signals, which help the cellular proteolytic machinery to identify and target the proteins towards degradation pathways. There are several important factors, which are critical for folding of proteins into their native three-dimensional conformations by the cytoplasmic chaperones; but in real time how the chaperones fold the newly synthesized polypeptide sequences into a particular three-dimensional shape within a fraction of second is still a mystery for biologists as well as mathematicians. Multiple similar unsolved or unaddressed questions need to be addressed in detail so that future line of research can dig deeper into the finer details of these structures of the proteins.
Collapse
Affiliation(s)
- Arun Upadhyay
- Department of Biochemistry, Central University of Rajasthan, Ajmer, 305817, India.
| |
Collapse
|
14
|
Bacterial Amyloids: Biogenesis and Biomaterials. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1174:113-159. [DOI: 10.1007/978-981-13-9791-2_4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
15
|
Niu M, Li Y, Wang C, Han K. RFAmyloid: A Web Server for Predicting Amyloid Proteins. Int J Mol Sci 2018; 19:ijms19072071. [PMID: 30013015 PMCID: PMC6073578 DOI: 10.3390/ijms19072071] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 07/10/2018] [Accepted: 07/12/2018] [Indexed: 12/22/2022] Open
Abstract
Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer’s disease and Creutzfeldt–Jakob’s disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy’s overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL http://server.malab.cn/RFAmyloid/.
Collapse
Affiliation(s)
- Mengting Niu
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Yanjuan Li
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150040, China.
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150040, China.
| |
Collapse
|
16
|
Parvizpour S, Razmara J, Omidi Y. Breast cancer vaccination comes to age: impacts of bioinformatics. ACTA ACUST UNITED AC 2018; 8:223-235. [PMID: 30211082 PMCID: PMC6128970 DOI: 10.15171/bi.2018.25] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Revised: 04/02/2018] [Accepted: 04/03/2018] [Indexed: 01/01/2023]
Abstract
![]()
Introduction: Breast cancer, as one of the major causes of cancer death among women, is the central focus of this study. The recent advances in the development and application of computational tools and bioinformatics in the field of immunotherapy of malignancies such as breast cancer have emerged the new dominion of immunoinformatics, and therefore, next generation of immunomedicines .
Methods: Having reviewed the most recent works on the applications of computational tools, we provide comprehensive insights into the breast cancer incidence and its leading causes as well as immunotherapy approaches and the future trends. Furthermore, we discuss the impacts of bioinformatics on different stages of vaccine design for the breast cancer, which can be used to produce much more efficient vaccines through a rationalized time- and cost-effective in silico approaches prior to conducting costly experiments.
Results: The tools can be significantly used for designing the immune system-modulating drugs and vaccines based on in silico approaches prior to in vitro and in vivo experimental evaluations. Application of immunoinformatics in the cancer immunotherapy has shown its success in the pre-clinical models. This success returns back to the impacts of several powerful computational approaches developed during the last decade.
Conclusion: Despite the invention of a number of vaccines for the cancer immunotherapy, more computational and clinical trials are required to design much more efficient vaccines against various malignancies, including breast cancer.
Collapse
Affiliation(s)
- Sepideh Parvizpour
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Jafar Razmara
- Department of Computer Science, Faculty of mathematical Sciences, University of Tabriz, Tabriz, Iran
| | - Yadollah Omidi
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.,Department of Pharmaceutics, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
17
|
Mehta N, Devarakonda MV. Machine learning, natural language programming, and electronic health records: The next step in the artificial intelligence journey? J Allergy Clin Immunol 2018. [PMID: 29518424 DOI: 10.1016/j.jaci.2018.02.025] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Neil Mehta
- Education Informatics and Technology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, Ohio.
| | - Murthy V Devarakonda
- Department of Biomedical Informatics, Arizona State University, College of Health Solutions, Scottsdale, Ariz
| |
Collapse
|
18
|
Abstract
BACKGROUND Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. RESULTS HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. CONCLUSIONS In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, People's Republic of China
- Guangdong Province Key Laboratory of Popular High Performance Computers, Shenzhen University, Shenzhen, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Shixiang Wan
- School of Computer Science and Technology, Tianjin University, Tianjin, People's Republic of China
| | - Xiangxiang Zeng
- Department of Computer Science, Xiamen University, Xiamen, China.
| | - Zhanshan Sam Ma
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
19
|
Smith DJ, Shell MS. Can Simple Interaction Models Explain Sequence-Dependent Effects in Peptide Homodimerization? J Phys Chem B 2017; 121:5928-5943. [PMID: 28537734 DOI: 10.1021/acs.jpcb.7b03186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The development of rapid methods to explain and predict peptide interactions, aggregation, and self-assembly has become important to understanding amyloid disease pathology, the shelf stability of peptide therapeutics, and the design of novel peptide materials. Although experimental aggregation databases have been used to develop correlative and statistical models, molecular simulations offer atomic-level details that potentially provide greater physical insight and allow one to single out the most explanatory simple models. Here, we outline one such approach using a case study that develops homodimerization models for serine-glycine peptides with various hydrophobic leucine mutations. Using detailed all-atom simulations, we calculate reference dimerization free energy profiles and binding constants for a small peptide library. We then use statistical methods to systematically assess whether simple interaction models, which do not require expensive simulations and free energy calculation, can capture them. Surprisingly, some combinations of a few simple scaling laws well recapitulate the detailed, all-atom results with high accuracy. Specifically, we find that a recently proposed phenomenological hydrophobic force law and coarse measures of entropic effects in binding offer particularly high explanatory power, underscoring the physical relevance to association that these driving forces can play.
Collapse
Affiliation(s)
- David J Smith
- Department of Chemical Engineering, University of California, Santa Barbara , Santa Barbara, California 93106, United States
| | - M Scott Shell
- Department of Chemical Engineering, University of California, Santa Barbara , Santa Barbara, California 93106, United States
| |
Collapse
|
20
|
Bemporad F, Ramazzotti M. From the Evolution of Protein Sequences Able to Resist Self-Assembly to the Prediction of Aggregation Propensity. INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY 2016; 329:1-47. [PMID: 28109326 DOI: 10.1016/bs.ircmb.2016.08.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Folding of polypeptide chains into biologically active entities is an astonishingly complex process, determined by the nature and the sequence of residues emerging from ribosomes. While it has been long believed that evolution has pressed genomes so that specific sequences could adopt unique, functional three-dimensional folds, it is now clear that complex protein machineries act as quality control system and supervise folding. Notwithstanding that, events such as erroneous folding, partial folding, or misfolding are frequent during the life of a cell or a whole organism, and they can escape controls. One of the possible outcomes of this misbehavior is cross-β aggregation, a super secondary structure which represents the hallmark of self-assembled, well organized, and extremely ordered structures termed amyloid fibrils. What if evolution would have not taken into account such possibilities? Twenty years of research point toward the idea that, in fact, evolution has constantly supervised the risk of errors and minimized their impact. In this review we tried to survey the major findings in the amyloid field, trying to describe what the real pitfalls of protein folding are-from an evolutionary perspective-and how sequence and structural features have evolved to balance the need for perfect, dynamic, functionally efficient structures, and the detrimental effects implicit in the dangerous process of folding. We will discuss how the knowledge obtained from these studies has been employed to produce computational methods able to assess, predict, and discriminate the aggregation properties of protein sequences.
Collapse
Affiliation(s)
- F Bemporad
- Università degli Studi di Firenze, Firenze, Italy.
| | - M Ramazzotti
- Università degli Studi di Firenze, Firenze, Italy.
| |
Collapse
|
21
|
Gasior P, Kotulska M. FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids. BMC Bioinformatics 2014; 15:54. [PMID: 24564523 PMCID: PMC3941796 DOI: 10.1186/1471-2105-15-54] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 02/03/2014] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Amyloids are proteins capable of forming fibrils whose intramolecular contact sites assume densely packed zipper pattern. Their oligomers can underlie serious diseases, e.g. Alzheimer's and Parkinson's diseases. Recent studies show that short segments of aminoacids can be responsible for amyloidogenic properties of a protein. A few hundreds of such peptides have been experimentally found but experimental testing of all candidates is currently not feasible. Here we propose an original machine learning method for classification of aminoacid sequences, based on discovering a segment with a discriminative pattern of site-specific co-occurrences between sequence elements. The pattern is based on the positions of residues with correlated occurrence over a sliding window of a specified length. The algorithm first recognizes the most relevant training segment in each positive training instance. Then the classification is based on maximal distances between co-occurrence matrix of the relevant segments in positive training sequences and the matrix from negative training segments. The method was applied for studying sequences of aminoacids with regard to their amyloidogenic properties. RESULTS Our method was first trained on available datasets of hexapeptides with the amyloidogenic classification, using 5 or 6-residue sliding windows. Depending on the choice of training and testing datasets, the area under ROC curve obtained the value up to 0.80 for experimental, and 0.95 for computationally generated (with 3D profile method) datasets. Importantly, the results on 5-residue segments were not significantly worse, although the classification required that algorithm first recognized the most relevant training segments. The dataset of long sequences, such as sup35 prion and a few other amyloid proteins, were applied to test the method and gave encouraging results. Our web tool FISH Amyloid was trained on all available experimental data 4-10 residues long, offers prediction of amyloidogenic segments in protein sequences. CONCLUSIONS We proposed a new original classification method which recognizes co-occurrence patterns in sequences. The method reveals characteristic classification pattern of the data and finds the segments where its scoring is the strongest, also in long training sequences. Applied to the problem of amyloidogenic segments recognition, it showed a good potential for classification problems in bioinformatics.
Collapse
Affiliation(s)
| | - Malgorzata Kotulska
- Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, 50-370 Wroclaw, Poland.
| |
Collapse
|
22
|
Abstract
ABSTRACT
Advanced molecular biology techniques developed during the past few decades have allowed the industry to exploit and commercialize the natural defense mechanisms that antibodies provide. This review discusses the latest advances in antibody-engineering technologies to enhance clinical efficacy and outcomes. For the constant regions, the choice of the antibody class and isotype has to be made carefully to suit the therapeutic applications. Engineering of the Fc region, either by direct targeted mutagenesis or by modifying the nature of its
N
-glycan, has played an important role in recent years in increasing half-life or controlling effector functions. The variable regions of the antibody are responsible for binding affinity and exquisite specificity to the target molecule, which together with the Fc determine the drug's efficacy and influence the drug dose required to obtain the desired effectiveness. A key requirement during antibody development is therefore to affinity mature the variable regions when necessary, so that they bind the therapeutic target with sufficiently high affinity to guarantee effective occupancy over prolonged periods. If the antibody was obtained from a non-human source, such as rodents, a humanization process has to be applied to minimize immunogenicity while maintaining the desired binding affinity and selectivity. Finally, we discuss the next next-generation antibodies, such as antibody-drug conjugates, bispecific antibodies, and immunocytokines, which are being developed to meet future challenges.
Collapse
|
23
|
Kotulska M, Unold O. On the amyloid datasets used for training PAFIG--how (not) to extend the experimental dataset of hexapeptides. BMC Bioinformatics 2013; 14:351. [PMID: 24305169 PMCID: PMC3879009 DOI: 10.1186/1471-2105-14-351] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 11/15/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Amyloids are proteins capable of forming aberrant intramolecular contact sites, characteristic of beta zipper configuration. Amyloids can underlie serious health conditions, e.g. Alzheimer's or Parkinson's diseases. It has been proposed that short segments of amino acids can be responsible for protein amyloidogenicity, but no more than two hundred such hexapeptides have been experimentally found. The authors of the computational tool Pafig published in BMC Bioinformatics a method for extending the amyloid hexapeptide dataset that could be used for training and testing models. They assumed that all hexapeptides belonging to an amyloid protein can be regarded as amylopositive, while those from proteins never reported as amyloid are always amylonegative. Here we show why the above described method of extending datasets is wrong and discuss the reasons why the incorrect data could lead to falsely correct classification. RESULTS The amyloid classification of hexapeptides by Pafig was confronted with the classification results from different state of the art computational methods and the outputs of all methods were studied by clustering analysis. The clustering methods show that Pafig is an outlier with regard to other approaches. Our study of the statistical patterns of its training and testing datasets showed a strong bias towards STVIIE hexapeptide in their positive part. Different statistical patterns of seemingly amylo-positive and -negative hexapeptides allow for a repeatable classification, which is not related to amyloid propensity of the hexapetides. CONCLUSIONS Our study on recognition of amyloid hexapeptides showed that occurrence of incidental patterns in wrongly selected datasets can produce falsely correct results of classification. The assumption that all hexapeptides belonging to amyloid protein can be regarded as amylopositive and those from proteins never reported as amyloid are always amylonegative is not supported by any other computational method. This is in line with experimental observations that amyloid propensity of a full protein can result from only one amyloidogenic fragment in this protein, while the occurrence of amyliodogenic part that is well hidden inside the protein may never lead to fibril formation. This leads to the conclusion that Pafig does not provide correct classification with regard to amyloidogenicity.
Collapse
Affiliation(s)
- Malgorzata Kotulska
- Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, 50-370 Wroclaw, Poland.
| | | |
Collapse
|
24
|
Emily M, Talvas A, Delamarche C. MetAmyl: a METa-predictor for AMYLoid proteins. PLoS One 2013; 8:e79722. [PMID: 24260292 PMCID: PMC3834037 DOI: 10.1371/journal.pone.0079722] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 10/04/2013] [Indexed: 12/17/2022] Open
Abstract
The aggregation of proteins or peptides in amyloid fibrils is associated with a number of clinical disorders, including Alzheimer's, Huntington's and prion diseases, medullary thyroid cancer, renal and cardiac amyloidosis. Despite extensive studies, the molecular mechanisms underlying the initiation of fibril formation remain largely unknown. Several lines of evidence revealed that short amino-acid segments (hot spots), located in amyloid precursor proteins act as seeds for fibril elongation. Therefore, hot spots are potential targets for diagnostic/therapeutic applications, and a current challenge in bioinformatics is the development of methods to accurately predict hot spots from protein sequences. In this paper, we combined existing methods into a meta-predictor for hot spots prediction, called MetAmyl for METapredictor for AMYLoid proteins. MetAmyl is based on a logistic regression model that aims at weighting predictions from a set of popular algorithms, statistically selected as being the most informative and complementary predictors. We evaluated the performances of MetAmyl through a large scale comparative study based on three independent datasets and thus demonstrated its ability to differentiate between amyloidogenic and non-amyloidogenic polypeptides. Compared to 9 other methods, MetAmyl provides significant improvement in prediction on studied datasets. We further show that MetAmyl is efficient to highlight the effect of point mutations involved in human amyloidosis, so we suggest this program should be a useful complementary tool for the diagnosis of these diseases.
Collapse
Affiliation(s)
- Mathieu Emily
- Agrocampus Ouest - Applied Mathematics Department, Rennes, France
- Institut de Recherche Mathématique de Rennes, UMR6625 CNRS, Rennes, France
- Université Rennes 2, Rennes, France
| | - Anthony Talvas
- Institut de Recherche Mathématique de Rennes, UMR6625 CNRS, Rennes, France
- Université Rennes 1 - IGDR, UMR6290 CNRS, Rennes, France
| | | |
Collapse
|
25
|
Stanislawski J, Kotulska M, Unold O. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides. BMC Bioinformatics 2013; 14:21. [PMID: 23327628 PMCID: PMC3566972 DOI: 10.1186/1471-2105-14-21] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 12/19/2012] [Indexed: 11/17/2022] Open
Abstract
Background Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. Results We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). Conclusions We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset proved representative enough to use simple statistical methods for testing the amylogenicity based only on six letter sequences. Statistical machine learning methods such as Alternating Decision Tree and Multilayer Perceptron can replace the energy based classifier, with advantage of very significantly reduced computational time and simplicity to perform the analysis. Additionally, a decision tree provides a set of very easily interpretable rules.
Collapse
Affiliation(s)
- Jerzy Stanislawski
- Institute of Computer Engineering, Control and Robotics, Wroclaw University of Technology, 50-370 Wroclaw, Poland
| | | | | |
Collapse
|
26
|
Tsolis AC, Papandreou NC, Iconomidou VA, Hamodrakas SJ. A consensus method for the prediction of 'aggregation-prone' peptides in globular proteins. PLoS One 2013; 8:e54175. [PMID: 23326595 PMCID: PMC3542318 DOI: 10.1371/journal.pone.0054175] [Citation(s) in RCA: 227] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2012] [Accepted: 12/11/2012] [Indexed: 02/03/2023] Open
Abstract
The purpose of this work was to construct a consensus prediction algorithm of ‘aggregation-prone’ peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the production of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users (http://biophysics.biol.uoa.gr/AMYLPRED2), for the consensus prediction of amyloidogenic determinants/‘aggregation-prone’ peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are associated with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/solubility in biotechnology (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins).
Collapse
Affiliation(s)
- Antonios C. Tsolis
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens, Greece
| | - Nikos C. Papandreou
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens, Greece
| | - Vassiliki A. Iconomidou
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens, Greece
| | - Stavros J. Hamodrakas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens, Greece
- * E-mail:
| |
Collapse
|
27
|
Liaw C, Tung CW, Ho SY. Prediction and analysis of antibody amyloidogenesis from sequences. PLoS One 2013; 8:e53235. [PMID: 23308169 PMCID: PMC3538782 DOI: 10.1371/journal.pone.0053235] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2012] [Accepted: 11/27/2012] [Indexed: 11/23/2022] Open
Abstract
Antibody amyloidogenesis is the aggregation of soluble proteins into amyloid fibrils that is one of major causes of the failures of humanized antibodies. The prediction and prevention of antibody amyloidogenesis are helpful for restoring and enhancing therapeutic effects. Due to a large number of possible germlines, the existing method is not practical to predict sequences of novel germlines, which establishes individual models for each known germline. This study proposes a first automatic and across-germline prediction method (named AbAmyloid) capable of predicting antibody amyloidogenesis from sequences. Since the amyloidogenesis is determined by a whole sequence of an antibody rather than germline-dependent properties such as mutated residues, this study assess three types of germline-independent sequence features (amino acid composition, dipeptide composition and physicochemical properties). AbAmyloid using a Random Forests classifier with dipeptide composition performs well on a data set of 12 germlines. The within- and across-germline prediction accuracies are 83.10% and 83.33% using Jackknife tests, respectively, and the novel-germline prediction accuracy using a leave-one-germline-out test is 72.22%. A thorough analysis of sequence features is conducted to identify informative properties for further providing insights to antibody amyloidogenesis. Some identified informative physicochemical properties are amphiphilicity, hydrophobicity, reverse turn, helical structure, isoelectric point, net charge, mutability, coil, turn, linker, nuclear protein, etc. Additionally, the numbers of ubiquitylation sites in amyloidogenic and non-amyloidogenic antibodies are found to be significantly different. It reveals that antibodies less likely to be ubiquitylated tend to be amyloidogenic. The method AbAmyloid capable of automatically predicting antibody amyloidogenesis of novel germlines is implemented as a publicly available web server at http://iclab.life.nctu.edu.tw/abamyloid.
Collapse
Affiliation(s)
- Chyn Liaw
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Chun-Wei Tung
- School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- * E-mail:
| |
Collapse
|
28
|
Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design. Protein Eng Des Sel 2012; 25:507-21. [PMID: 22661385 PMCID: PMC3449398 DOI: 10.1093/protein/gzs024] [Citation(s) in RCA: 169] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Revised: 04/14/2012] [Accepted: 04/19/2012] [Indexed: 11/12/2022] Open
Abstract
Recent clinical trials using antibodies with low toxicity and high efficiency have raised expectations for the development of next-generation protein therapeutics. However, the process of obtaining therapeutic antibodies remains time consuming and empirical. This review summarizes recent progresses in the field of computer-aided antibody development mainly focusing on antibody modeling, which is divided essentially into two parts: (i) modeling the antigen-binding site, also called the complementarity determining regions (CDRs), and (ii) predicting the relative orientations of the variable heavy (V(H)) and light (V(L)) chains. Among the six CDR loops, the greatest challenge is predicting the conformation of CDR-H3, which is the most important in antigen recognition. Further computational methods could be used in drug development based on crystal structures or homology models, including antibody-antigen dockings and energy calculations with approximate potential functions. These methods should guide experimental studies to improve the affinities and physicochemical properties of antibodies. Finally, several successful examples of in silico structure-based antibody designs are reviewed. We also briefly review structure-based antigen or immunogen design, with application to rational vaccine development.
Collapse
Affiliation(s)
- Daisuke Kuroda
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka, Japan.
| | | | | | | |
Collapse
|
29
|
Malueka RG, Takaoka Y, Yagi M, Awano H, Lee T, Dwianingsih EK, Nishida A, Takeshima Y, Matsuo M. Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers. BMC Genet 2012; 13:23. [PMID: 22462762 PMCID: PMC3350383 DOI: 10.1186/1471-2156-13-23] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 03/31/2012] [Indexed: 12/29/2022] Open
Abstract
Background Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the dystrophin gene. Skipping of a target dystrophin exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated dystrophin exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the dystrophin exons in terms of their splicing regulatory factors. Results Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a) identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group. Conclusions The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy.
Collapse
Affiliation(s)
- Rusdy Ghazali Malueka
- Department of Pediatrics, Graduate School of Medicine, Kobe University, Chuo, Kobe 6500017, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Protein aggregation underlies the development of an increasing number of conformational human diseases of growing incidence, such as Alzheimer's and Parkinson's diseases. Furthermore, the accumulation of recombinant proteins as intracellular aggregates represents a critical obstacle for the biotechnological production of polypeptides. Also, ordered protein aggregates constitute novel and versatile nanobiomaterials. Consequently, there is an increasing interest in the development of methods able to forecast the aggregation properties of polypeptides in order to modulate their intrinsic solubility. In this context, we have developed AGGRESCAN, a simple and fast algorithm that predicts aggregation-prone segments in protein sequences, compares the aggregation properties of different proteins or protein sets and analyses the effect of mutations on protein aggregation propensities.
Collapse
|
31
|
Zhang GL, Lin HH, Keskin DB, Reinherz EL, Brusic V. Dana-Farber repository for machine learning in immunology. J Immunol Methods 2011; 374:18-25. [PMID: 21782820 DOI: 10.1016/j.jim.2011.07.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2011] [Accepted: 07/06/2011] [Indexed: 11/27/2022]
Abstract
The immune system is characterized by high combinatorial complexity that necessitates the use of specialized computational tools for analysis of immunological data. Machine learning (ML) algorithms are used in combination with classical experimentation for the selection of vaccine targets and in computational simulations that reduce the number of necessary experiments. The development of ML algorithms requires standardized data sets, consistent measurement methods, and uniform scales. To bridge the gap between the immunology community and the ML community, we designed a repository for machine learning in immunology named Dana-Farber Repository for Machine Learning in Immunology (DFRMLI). This repository provides standardized data sets of HLA-binding peptides with all binding affinities mapped onto a common scale. It also provides a list of experimentally validated naturally processed T cell epitopes derived from tumor or virus antigens. The DFRMLI data were preprocessed and ensure consistency, comparability, detailed descriptions, and statistically meaningful sample sizes for peptides that bind to various HLA molecules. The repository is accessible at http://bio.dfci.harvard.edu/DFRMLI/.
Collapse
Affiliation(s)
- Guang Lan Zhang
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| | | | | | | | | |
Collapse
|