1
|
Hasan AH, Shakya S, Hussain FHS, Murugesan S, Chander S, Pratama MRF, Jamil S, Das B, Biswas S, Jamalis J. Design, synthesis, anti-acetylcholinesterase evaluation and molecular modelling studies of novel coumarin-chalcone hybrids. J Biomol Struct Dyn 2023; 41:11450-11462. [PMID: 36591704 DOI: 10.1080/07391102.2022.2162583] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 12/19/2022] [Indexed: 01/03/2023]
Abstract
The major enzyme responsible for the hydrolytic breakdown of the neurotransmitter acetylcholine (ACh) is acetylcholinesterase (AChE). Acetylcholinesterase inhibitors (AChEIs) are the most prescribed class of medications for the treatment of Alzheimer's disease (AD) and dementia. The limitations of available therapy, like side effects, drug tolerance, and inefficacy in halting disease progression, drive the need for better, more efficacious, and safer drugs. In this study, a series of fourteen novel chalcone-coumarin derivatives (8a-n) were designed, synthesized and characterized by spectral techniques like FT-IR, NMR, and HR-MS. Subsequently, the synthesized compounds were tested for their ability to inhibit acetylcholinesterase (AChE) activity by Ellman's method. All tested compounds showed AChE inhibition with IC50 value ranging from 0.201 ± 0.008 to 1.047 ± 0.043 μM. Hybrid 8d having chloro substitution on ring-B of the chalcone scaffold showed relatively better potency, with IC50 value of 0.201 ± 0.008 μM compared to other members of the series. The reference drug, galantamine, exhibited an IC50 at 1.142 ± 0.027 μM. Computational studies revealed that designed compounds bind to the peripheral anionic site (PAS), the catalytic active site (CAS), and the mid-gorge site of AChE. Putative binding modes, ligand-enzyme interactions, and stability of the best active compound are studied using molecular docking, followed by molecular dynamics (MD) simulations. The cytotoxicity of the synthesised derivatives was determined using the MTT test at three concentrations (100 g/mL, 500 g/mL, and 1 mg/mL). None of the chemicals had a significant effect on the body at the highest dose of 1 mg/mL.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Aso Hameed Hasan
- Department of Chemistry, Faculty of Science, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
- Department of Chemistry, College of Science, University of Garmian, Kalar, Kurdistan Region-Iraq, Iraq
| | - Sonam Shakya
- Department of Chemistry, Faculty of Science, Aligarh Muslim University, Aligarh, Uttar Pradesh, India
| | - Faiq H S Hussain
- Department of Medical Analysis, Faculty of Applied Science, Tishk International University, Erbil, Kurdistan Region-Iraq, Iraq
| | - Sankaranarayanan Murugesan
- Medicinal Chemistry Research Laboratory, Birla Institute of Technology & Science Pilani (BITS Pilani), Pilani, Rajasthan, India
| | - Subhash Chander
- Amity Institute of Phytochemistry and Phytomedicine, Amity University Uttar Pradesh, Noida, Uttar Pradesh, India
| | - Mohammad Rizki Fadhil Pratama
- Doctoral Program of Pharmaceutical Sciences, Universitas Airlangga, Surabaya, East Java, Indonesia
- Department of Pharmacy, Universitas Muhammadiyah Palangkaraya, Palangka Raya, Central Kalimantan, Indonesia
| | - Shajarahtunnur Jamil
- Department of Chemistry, Faculty of Science, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
| | - Basundhara Das
- Amity Institute of Molecular Medicine and Stem Cell Research (AIMMSCR), Translational Cancer & Stem Cell Research Laboratory, Noida, Uttar Pradesh, India
| | - Subhrajit Biswas
- Amity Institute of Molecular Medicine and Stem Cell Research (AIMMSCR), Translational Cancer & Stem Cell Research Laboratory, Noida, Uttar Pradesh, India
| | - Joazaizulfazli Jamalis
- Department of Chemistry, Faculty of Science, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
| |
Collapse
|
2
|
An J, Weng X. Collectively encoding protein properties enriches protein language models. BMC Bioinformatics 2022; 23:467. [DOI: 10.1186/s12859-022-05031-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 10/31/2022] [Indexed: 11/10/2022] Open
Abstract
AbstractPre-trained natural language processing models on a large natural language corpus can naturally transfer learned knowledge to protein domains by fine-tuning specific in-domain tasks. However, few studies focused on enriching such protein language models by jointly learning protein properties from strongly-correlated protein tasks. Here we elaborately designed a multi-task learning (MTL) architecture, aiming to decipher implicit structural and evolutionary information from three sequence-level classification tasks for protein family, superfamily and fold. Considering the co-existing contextual relevance between human words and protein language, we employed BERT, pre-trained on a large natural language corpus, as our backbone to handle protein sequences. More importantly, the encoded knowledge obtained in the MTL stage can be well transferred to more fine-grained downstream tasks of TAPE. Experiments on structure- or evolution-related applications demonstrate that our approach outperforms many state-of-the-art Transformer-based protein models, especially in remote homology detection.
Collapse
|
3
|
Barger J, Adhikari B. New Labeling Methods for Deep Learning Real-Valued Inter-Residue Distance Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3586-3594. [PMID: 34559660 DOI: 10.1109/tcbb.2021.3115053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
BACKGROUND Much of the recent success in protein structure prediction has been a result of accurate protein contact prediction-a binary classification problem. Dozens of methods, built from various types of machine learning and deep learning algorithms, have been published over the last two decades for predicting contacts. Recently, many groups, including Google DeepMind, have demonstrated that reformulating the problem as a multi-class classification problem is a more promising direction to pursue. As an alternative approach, we recently proposed real-valued distance predictions, formulating the problem as a regression problem. The nuances of protein 3D structures make this formulation appropriate, allowing predictions to reflect inter-residue distances in nature. Despite these promises, the accurate prediction of real-valued distances remains relatively unexplored; possibly due to classification being better suited to machine and deep learning algorithms. METHODS Can regression methods be designed to predict real-valued distances as precise as binary contacts? To investigate this, we propose multiple novel methods of input label engineering, which is different from feature engineering, with the goal of optimizing the distribution of distances to cater to the loss function of the deep-learning model. Since an important utility of predicted contacts or distances is to build three-dimensional models, we also tested if predicted distances can reconstruct more accurate models than contacts. RESULTS Our results demonstrate, for the first time, that deep learning methods for real-valued protein distance prediction can deliver distances as precise as binary classification methods. When using an optimal distance transformation function on the standard PSICOV dataset consisting of 150 representative proteins, the precision of 'top-all' long-range contacts improves from 60.9% to 61.4% when predicting real-valued distances instead of contacts. When building three-dimensional models we observed an average TM-score increase from 0.61 to 0.72, highlighting the advantage of predicting real-valued distances.
Collapse
|
4
|
Potential inhibitory activity of phytoconstituents against black fungus: In silico ADMET, molecular docking and MD simulation studies. COMPUTATIONAL TOXICOLOGY 2022; 24:100247. [PMID: 36193218 PMCID: PMC9508704 DOI: 10.1016/j.comtox.2022.100247] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 09/17/2022] [Accepted: 09/20/2022] [Indexed: 11/30/2022]
Abstract
Mucormycosis or “black fungus” has been currently observed in India, as a secondary infection in COVID-19 infected patients in the post-COVID-stage. Fungus is an uncommon opportunistic infection that affects people who have a weak immune system. In this study, 158 antifungal phytochemicals were screened using molecular docking against glucoamylase enzyme of Rhizopus oryzae to identify potential inhibitors. The docking scores of the selected phytochemicals were compared with Isomaltotriose as a positive control. Most of the compounds showed lower binding energy values than Isomaltotriose (-6.4 kcal/mol). Computational studies also revealed the strongest binding affinity of the screened phytochemicals was Dioscin (-9.4 kcal/mol). Furthermore, the binding interactions of the top ten potential phytochemicals were elucidated and further analyzed. In-silico ADME and toxicity prediction were also evaluated using SwissADME and admetSAR online servers. Compounds Piscisoflavone C, 8-O-methylaverufin and Punicalagin exhibited positive results with the Lipinski filter and drug-likeness and showed mild to moderate of toxicity. Molecular dynamics (MD) simulation (at 300 K for 100 ns) was also employed to the docked ligand-target complex to explore the stability of ligand-target complex, improve docking results, and analyze the molecular mechanisms of protein-target interactions.
Collapse
|
5
|
Gaber A, Alsanie WF, Alhomrani M, Alamri AS, Alyami H, Shakya S, Habeeballah H, Alkhatabi HA, Felimban RI, Alamri A, Alhabeeb AA, Raafat BM, Refat MS. Multispectral and Molecular Docking Studies Reveal Potential Effectiveness of Antidepressant Fluoxetine by Forming π-Acceptor Complexes. Molecules 2022; 27:molecules27185883. [PMID: 36144618 PMCID: PMC9505585 DOI: 10.3390/molecules27185883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 09/05/2022] [Accepted: 09/06/2022] [Indexed: 11/26/2022] Open
Abstract
Poor mood, lack of pleasure, reduced focus, remorse, unpleasant thoughts, and sleep difficulties are all symptoms of depression. The only approved treatment for children and adolescents with major depressive disorder (MDD) is fluoxetine hydrochloride (FXN), a serotonin selective reuptake inhibitor antidepressant. MDD is the most common cause of disability worldwide. In the present research, picric acid (PA); dinitrobenzene; p-nitro benzoic acid; 2,6-dichloroquinone-4-chloroimide; 2,6-dibromoquinone-4-chloroimide; and 7,7′,8,8′-tetracyanoquinodimethane were used to make 1:1 FXN charge-transfer compounds in solid and liquid forms. The isolated complexes were then characterized by elemental analysis, conductivity, infrared, Raman, and 1H-NMR spectra, thermogravimetric analysis, scanning electron microscopy, and X-ray powder diffraction. Additionally, a molecular docking investigation was conducted on the donor moiety using FXN alone and the resulting charge transfer complex [(FXN)(PA)] as an acceptor to examine the interactions against two protein receptors (serotonin or dopamine). Interestingly, the [(FXN)(PA)] complex binds to both serotonin and dopamine more effectively than the FXN drug alone. Furthermore, [(FXN)(PA)]–serotonin had a greater binding energy than [FXN]–serotonin. Theoretical data were also generated by density functional theory simulations, which aided the molecular geometry investigation and could be beneficial to researchers in the future.
Collapse
Affiliation(s)
- Ahmed Gaber
- Department of Biology, College of Science, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
- Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
- Correspondence: (A.G.); (M.S.R.)
| | - Walaa F. Alsanie
- Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
- Department of Clinical Laboratories Sciences, The Faculty of Applied Medical Sciences, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Majid Alhomrani
- Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
- Department of Clinical Laboratories Sciences, The Faculty of Applied Medical Sciences, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Abdulhakeem S. Alamri
- Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
- Department of Clinical Laboratories Sciences, The Faculty of Applied Medical Sciences, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Hussain Alyami
- College of Medicine, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Sonam Shakya
- Department of Chemistry, Faculty of Science, Aligarh Muslim University, Aligarh 202002, India
| | - Hamza Habeeballah
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences in Rabigh, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| | - Heba A. Alkhatabi
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
- King Fahd Medical Research Centre, Hematology Research Unit, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| | - Raed I. Felimban
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
- Center of Innovation in Personalized Medicine (CIPM), 3D Bioprinting Unit, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| | - Abdulwahab Alamri
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Hail, P.O. Box 2240, Hail 55476, Saudi Arabia
| | | | - Bassem M. Raafat
- Department of Radiological Sciences, College of Applied Medical Sciences, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Moamen S. Refat
- Department of Chemistry, College of Science, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
- Correspondence: (A.G.); (M.S.R.)
| |
Collapse
|
6
|
Alsanie WF, Alhomrani M, Alamri AS, Alyami H, Shakya S, Habeeballah H, Alkhatabi HA, Felimban RI, Alamri A, Alhabeeb AA, Raafat BM, Refat MS, Gaber A. Attempting to Increase the Effectiveness of the Antidepressant Trazodone Hydrochloride Drug Using π-Acceptors. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:11281. [PMID: 36141553 PMCID: PMC9517268 DOI: 10.3390/ijerph191811281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 09/02/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
Major depressive disorder is a prevalent mood illness that is mildly heritable. Cases with the highest familial risk had recurrence and onset at a young age. Trazodone hydrochloride is an antidepressant medicine that affects the chemical messengers in the brain known as neurotransmitters, which include acetylcholine, norepinephrine, dopamine, and serotonin. In the present research, in solid and liquid phases, the 1:1 charge-transfer complexes between trazodone hydrochloride (TZD) and six different π-acceptors were synthesized and investigated using different microscopic techniques. The relation of dative ion pairs [TZD+, A-], where A is the acceptor, was inferred via intermolecular charge-transfer complexes. Additionally, a molecular docking examination was utilized to compare the interactions of protein receptors (serotonin-6BQH) with the TZD alone or in combination with the six distinct acceptor charge-transfer complexes. To refine the docking results acquired from AutoDock Vina and to better examine the molecular mechanisms of receptor-ligand interactions, a 100 ns run of molecular dynamics simulation was used. All the results obtained in this study prove that the 2,6-dichloroquinone-4-chloroimide (DCQ)/TZD complex interacts with serotonin receptors more efficiently than reactant donor TZD only and that [(TZD)(DCQ)]-serotonin has the highest binding energy value of all π-acceptor complexes.
Collapse
Affiliation(s)
- Walaa F. Alsanie
- Department of Clinical Laboratories Sciences, The Faculty of Applied Medical Sciences, Taif University, Taif 21944, Saudi Arabia
- Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, Taif 21944, Saudi Arabia
| | - Majid Alhomrani
- Department of Clinical Laboratories Sciences, The Faculty of Applied Medical Sciences, Taif University, Taif 21944, Saudi Arabia
- Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, Taif 21944, Saudi Arabia
| | - Abdulhakeem S. Alamri
- Department of Clinical Laboratories Sciences, The Faculty of Applied Medical Sciences, Taif University, Taif 21944, Saudi Arabia
- Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, Taif 21944, Saudi Arabia
| | - Hussain Alyami
- College of Medicine, Taif University, Taif 21944, Saudi Arabia
| | - Sonam Shakya
- Department of Chemistry, Faculty of Science, Aligarh Muslim University, Aligarh 202002, India
| | - Hamza Habeeballah
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences in Rabigh, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Heba A. Alkhatabi
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
- King Fahd Medical Research Centre, Hematology Research Unit, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Raed I. Felimban
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Center of Innovation in Personalized Medicine (CIPM), 3D Bioprinting Unit, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Abdulwahab Alamri
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Hail, Hail 81442, Saudi Arabia
| | | | - Bassem M. Raafat
- Department of Radiological Sciences, College of Applied Medical Sciences, Taif University, Taif 21944, Saudi Arabia
| | - Moamen S. Refat
- Department of Chemistry, College of Science, Taif University, Taif 21944, Saudi Arabia
| | - Ahmed Gaber
- Centre of Biomedical Sciences Research (CBSR), Deanship of Scientific Research, Taif University, Taif 21944, Saudi Arabia
- Department of Biology, College of Science, Taif University, Taif 21944, Saudi Arabia
| |
Collapse
|
7
|
Pearce R, Li Y, Omenn GS, Zhang Y. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials. PLoS Comput Biol 2022; 18:e1010539. [PMID: 36112717 PMCID: PMC9518900 DOI: 10.1371/journal.pcbi.1010539] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 09/28/2022] [Accepted: 09/03/2022] [Indexed: 01/05/2023] Open
Abstract
Despite the immense progress recently witnessed in protein structure prediction, the modeling accuracy for proteins that lack sequence and/or structure homologs remains to be improved. We developed an open-source program, DeepFold, which integrates spatial restraints predicted by multi-task deep residual neural-networks along with a knowledge-based energy function to guide its gradient-descent folding simulations. The results on large-scale benchmark tests showed that DeepFold creates full-length models with accuracy significantly beyond classical folding approaches and other leading deep learning methods. Of particular interest is the modeling performance on the most difficult targets with very few homologous sequences, where DeepFold achieved an average TM-score that was 40.3% higher than trRosetta and 44.9% higher than DMPfold. Furthermore, the folding simulations for DeepFold were 262 times faster than traditional fragment assembly simulations. These results demonstrate the power of accurately predicted deep learning potentials to improve both the accuracy and speed of ab initio protein structure prediction.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Departments of Internal Medicine and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
8
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| |
Collapse
|
9
|
In silico discovery of multi-targeting inhibitors for the COVID-19 treatment by molecular docking, molecular dynamics simulation studies, and ADMET predictions. Struct Chem 2022. [DOI: 10.1007/s11224-022-01996-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
10
|
Jiang Y, Luo J, Huang D, Liu Y, Li DD. Machine Learning Advances in Microbiology: A Review of Methods and Applications. Front Microbiol 2022; 13:925454. [PMID: 35711777 PMCID: PMC9196628 DOI: 10.3389/fmicb.2022.925454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 05/09/2022] [Indexed: 12/18/2022] Open
Abstract
Microorganisms play an important role in natural material and elemental cycles. Many common and general biology research techniques rely on microorganisms. Machine learning has been gradually integrated with multiple fields of study. Machine learning, including deep learning, aims to use mathematical insights to optimize variational functions to aid microbiology using various types of available data to help humans organize and apply collective knowledge of various research objects in a systematic and scaled manner. Classification and prediction have become the main achievements in the development of microbial community research in the direction of computational biology. This review summarizes the application and development of machine learning and deep learning in the field of microbiology and shows and compares the advantages and disadvantages of different algorithm tools in four fields: microbiome and taxonomy, microbial ecology, pathogen and epidemiology, and drug discovery.
Collapse
|
11
|
Enhancement of Haloperidol Binding Affinity to Dopamine Receptor Via Forming a Charge-Transfer Complex with Picric Acid and 7,7,8,8-Tetracyanoquinodimethane for Improvement of the Antipsychotic Efficacy. Molecules 2022; 27:molecules27103295. [PMID: 35630772 PMCID: PMC9146347 DOI: 10.3390/molecules27103295] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 05/13/2022] [Accepted: 05/18/2022] [Indexed: 02/04/2023] Open
Abstract
Haloperidol (HPL) is a typical antipsychotic drug used to treat acute psychotic conditions, delirium, and schizophrenia. Solid charge transfer (CT) products of HPL with 7,7,8,8-tetracyanoquinodimethane (TCNQ) and picric acid (PA) have not been reported till date. Therefore, we conducted this study to investigate the donor–acceptor CT interactions between HPL (donor) and TCNQ and PA (π-acceptors) in liquid and solid states. The complete spectroscopic and analytical analyses deduced that the stoichiometry of these synthesized complexes was 1:1 molar ratio. Molecular docking calculations were performed for HPL as a donor and the resulting CT complexes with TCNQ and PA as acceptors with two protein receptors, serotonin and dopamine, to study the comparative interactions among them, as they are important neurotransmitters that play a large role in mental health. A molecular dynamics simulation was ran for 100 ns with the output from AutoDock Vina to refine docking results and better examine the molecular processes of receptor–ligand interactions. When compared to the reactant donor, the CT complex [(HPL)(TCNQ)] interacted with serotonin and dopamine more efficiently than HPL only. CT complex [(HPL)(TCNQ)] with dopamine (CTtD) showed the greatest binding energy value among all. Additionally, CTtD complex established more a stable interaction with dopamine than HPL–dopamine.
Collapse
|
12
|
Zhang H, Huang Y, Bei Z, Ju Z, Meng J, Hao M, Zhang J, Zhang H, Xi W. Inter-Residue Distance Prediction From Duet Deep Learning Models. Front Genet 2022; 13:887491. [PMID: 35651930 PMCID: PMC9148999 DOI: 10.3389/fgene.2022.887491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 03/30/2022] [Indexed: 12/04/2022] Open
Abstract
Residue distance prediction from the sequence is critical for many biological applications such as protein structure reconstruction, protein–protein interaction prediction, and protein design. However, prediction of fine-grained distances between residues with long sequence separations still remains challenging. In this study, we propose DuetDis, a method based on duet feature sets and deep residual network with squeeze-and-excitation (SE), for protein inter-residue distance prediction. DuetDis embraces the ability to learn and fuse features directly or indirectly extracted from the whole-genome/metagenomic databases and, therefore, minimize the information loss through ensembling models trained on different feature sets. We evaluate DuetDis and 11 widely used peer methods on a large-scale test set (610 proteins chains). The experimental results suggest that 1) prediction results from different feature sets show obvious differences; 2) ensembling different feature sets can improve the prediction performance; 3) high-quality multiple sequence alignment (MSA) used for both training and testing can greatly improve the prediction performance; and 4) DuetDis is more accurate than peer methods for the overall prediction, more reliable in terms of model prediction score, and more robust against shallow multiple sequence alignment (MSA).
Collapse
Affiliation(s)
- Huiling Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ying Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhendong Bei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhen Ju
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jintao Meng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Jingjing Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haiping Zhang
- University of Chinese Academy of Sciences, Beijing, China
| | - Wenhui Xi
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Wenhui Xi,
| |
Collapse
|
13
|
Newton MAH, Rahman J, Zaman R, Sattar A. Enhancing Protein Contact Map Prediction Accuracy via Ensembles of Inter-Residue Distance Predictors. Comput Biol Chem 2022; 99:107700. [DOI: 10.1016/j.compbiolchem.2022.107700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/03/2022]
|
14
|
Santra S, Jana M. Predicting the evolution of number of native contacts of a small protein by using deep learning approach. Comput Biol Chem 2022; 97:107625. [DOI: 10.1016/j.compbiolchem.2022.107625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 01/07/2022] [Accepted: 01/09/2022] [Indexed: 11/28/2022]
|
15
|
Enhancing the Antipsychotic Effect of Risperidone by Increasing Its Binding Affinity to Serotonin Receptor via Picric Acid: A Molecular Dynamics Simulation. Pharmaceuticals (Basel) 2022; 15:ph15030285. [PMID: 35337083 PMCID: PMC8952232 DOI: 10.3390/ph15030285] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 02/17/2022] [Accepted: 02/23/2022] [Indexed: 02/04/2023] Open
Abstract
The aim of this study was to assess the utility of inexpensive techniques in evaluating the interactions of risperidone (Ris) with different traditional -acceptors, with subsequent application of the findings into a Ris pharmaceutical formulation with improved therapeutic properties. Molecular docking calculations were performed using Ris and its different charge-transfer complexes (CT) with picric acid (PA), 2,3-dichloro-5,6-dicyanop-benzoquinon (DDQ), tetracyanoquinodimethane (TCNQ), tetracyano ethylene (TCNE), tetrabromo-pquinon (BL), and tetrachloro-p-quinon (CL), as donors, and three receptors (serotonin, dopamine, and adrenergic) as acceptors to study the comparative interactions among them. To refine the docking results and further investigate the molecular processes of receptor–ligand interactions, a molecular dynamics simulation was run with output obtained from AutoDock Vina. Among all investigated complexes, the [(Ris) (PA)]-serotonin (CTcS) complex showed the highest binding energy. Molecular dynamics simulation of the 100 ns run revealed that both the Ris-serotonin (RisS) and CTcS complexes had a stable conformation; however, the CTcS complex was more stable.
Collapse
|
16
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
17
|
Novel thiophene Chalcones-Coumarin as acetylcholinesterase inhibitors: Design, synthesis, biological evaluation, molecular docking, ADMET prediction and molecular dynamics simulation. Bioorg Chem 2021; 119:105572. [PMID: 34971946 DOI: 10.1016/j.bioorg.2021.105572] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 12/04/2021] [Accepted: 12/15/2021] [Indexed: 11/24/2022]
Abstract
A series of around eight novel chalcone based coumarin derivatives (23a-h) was designed, subjected to in-silico ADMET prediction, synthesized, characterized by IR, NMR, Mass analytical techniques and evaluated as acetylcholinesterase (AChE) inhibitor for the treatment of Alzheimer's disease (AD). The results of predicted ADMET study demonstrated the drug-likeness properties of the titled compounds with developmental challenges in lipophilicity and solubility parameters. The in vitro assessment of the synthesized compounds revealed that all of them showed significant activity (IC50 ranging from 0.42 to 1.296 µM) towards AChE compared to the standard drug, galantamine (IC50 = 1.142 ± 0.027 µM). Among these, compound 23e displayed the most potent inhibitory activity with IC50 value of 0.42 ± 0.019 µM. Cytotoxicity of all compounds was tested on normal human hepatic (THLE-2) cell lines at three different concentrations using the MTT assay, in which none of the compound showed significant toxicity at the highest concentration of 1000 µg/ml compared to the control group. Based on the docking study against AChE, the most active derivative 23e was orientated towards the active site and occupied both catalytic anionic site (CAS) and peripheral anionic site (PAS) of the target enzyme. In-silico studies revealed tested showed better inhibition activity of AChE compared to Butyrylcholinesterase (BuChE). Molecular dynamics simulation explored the stability and dynamic behavior of 23e- AChE complex.
Collapse
|
18
|
AboElkhair MA, Hasan ME, Mousa A, Moharam I, Sultan H, Malik Y, Sakr MA. In-silico evidence for enhancement of avian influenza virus H9N2 virulence by modulation of its hemagglutinin (HA) antigen function and stability during co-infection with infectious bronchitis virus in chickens. Virusdisease 2021; 32:548-558. [PMID: 34631979 DOI: 10.1007/s13337-021-00688-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 04/08/2021] [Indexed: 10/20/2022] Open
Abstract
In the last few decades, frequent incidences of avian influenza (AI) H9N2 outbreaks have caused high mortality in poultry farms resulting in colossal economic losses in several countries. In Egypt, the co-infection of H9N2 with the infectious bronchitis virus (IBV) has been observed extensively during these outbreaks. However, the pathogenicity of H9N2 in these outbreaks remained controversial. The current study reports isolation and characterization of the H9N2 virus recovered from a concurrent IBV infected broiler chicken flock in Egypt during 2011. The genomic RNA was subjected to RT-PCR amplification followed by sequencing and analysis. The deduced amino acid sequences of the eight segments of the current study H9N2 isolate were compared with those of Egyptian H9N2 viruses isolated from healthy and diseased chicken flocks from 2011 to 2013. In the phylogenetic analysis, the current study isolate was found to be closely related to the other Egyptian H9N2 viruses. Notably, no particular molecular characteristic difference was noticed among all the Egyptian H9N2 isolates from apparently healthy, diseased or co-infected with IBV chicken flocks. Nevertheless, in-silico analysis, we noted modulation of stability and motifs structure of Hemagglutinin (HA) antigen among the co-infecting H9N2 AI and the IBV and isolates from the diseased flocks. The findings suggest that the putative factor for enhancement of the H9N2 pathogenicity could be co-infection with other respiratory pathogens such as IBV that might change the HA stability and function. Supplementary Information The online version contains supplementary material available at 10.1007/s13337-021-00688-1.
Collapse
Affiliation(s)
- Mohammed A AboElkhair
- Department of Virology, Faculty of Veterinary Medicine, University of Sadat City, Sadat City, Monufia Egypt
| | - Mohamed E Hasan
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), University of Sadat City, Sadat City, Monufia Egypt
| | - Ahmed Mousa
- Department of Biochemistry and Nutrition, Faculty of Veterinary Medicine, University of Sadat City, Sadat City, Monufia Egypt
| | - Ibrahim Moharam
- Department of Bird and Rabbit Diseases, Faculty of Veterinary Medicine, University of Sadat City, Sadat City, Monufia Egypt
| | - Hesham Sultan
- Department of Bird and Rabbit Diseases, Faculty of Veterinary Medicine, University of Sadat City, Sadat City, Monufia Egypt
| | - Yashpal Malik
- Indian Veterinary Research Institute (IVRI), Izatnagar 243 122, Bareilly, Uttar Pradesh India
| | - Moustafa A Sakr
- Molecular Diagnostics and Therapeutics Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), University of Sadat City, Sadat City, Monufia Egypt
| |
Collapse
|
19
|
Mortuza SM, Zheng W, Zhang C, Li Y, Pearce R, Zhang Y. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nat Commun 2021; 12:5011. [PMID: 34408149 PMCID: PMC8373938 DOI: 10.1038/s41467-021-25316-w] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 08/04/2021] [Indexed: 11/28/2022] Open
Abstract
Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.
Collapse
Affiliation(s)
- S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
20
|
Pearce R, Zhang Y. Toward the solution of the protein structure prediction problem. J Biol Chem 2021; 297:100870. [PMID: 34119522 PMCID: PMC8254035 DOI: 10.1016/j.jbc.2021.100870] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/20/2022] Open
Abstract
Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.
| |
Collapse
|
21
|
Reza MS, Zhang H, Hossain MT, Jin L, Feng S, Wei Y. COMTOP: Protein Residue-Residue Contact Prediction through Mixed Integer Linear Optimization. MEMBRANES 2021; 11:membranes11070503. [PMID: 34209399 PMCID: PMC8305966 DOI: 10.3390/membranes11070503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
Protein contact prediction helps reconstruct the tertiary structure that greatly determines a protein’s function; therefore, contact prediction from the sequence is an important problem. Recently there has been exciting progress on this problem, but many of the existing methods are still low quality of prediction accuracy. In this paper, we present a new mixed integer linear programming (MILP)-based consensus method: a Consensus scheme based On a Mixed integer linear opTimization method for prOtein contact Prediction (COMTOP). The MILP-based consensus method combines the strengths of seven selected protein contact prediction methods, including CCMpred, EVfold, DeepCov, NNcon, PconsC4, plmDCA, and PSICOV, by optimizing the number of correctly predicted contacts and achieving a better prediction accuracy. The proposed hybrid protein residue–residue contact prediction scheme was tested in four independent test sets. For 239 highly non-redundant proteins, the method showed a prediction accuracy of 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively. When tested on the CASP13 and CASP14 test sets, the proposed method obtained accuracies of 75.91% and 77.49% for top-L/5 predictions, respectively. COMTOP was further tested on 57 non-redundant α-helical transmembrane proteins and achieved prediction accuracies of 64.34% and 73.91% for top-L/2 and top-L/5 predictions, respectively. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For example, COMTOP performed much better for large number of contact predictions (such as top-5L and top-3L) than for small number of contact predictions such as top-L/2 and top-L/5. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.
Collapse
Affiliation(s)
- Md. Selim Reza
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Huiling Zhang
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Md. Tofazzal Hossain
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Langxi Jin
- Department of Computer Science and Technology, School of Computer Science and Technology, Harbin University of Science and Technology, 52 Xuefu Road, Nangang District, Harbin 150080, China;
| | - Shengzhong Feng
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Yanjie Wei
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
- Correspondence:
| |
Collapse
|
22
|
Bahai A, Asgari E, Mofrad MRK, Kloetgen A, McHardy AC. EpitopeVec: Linear Epitope Prediction Using Deep Protein Sequence Embeddings. Bioinformatics 2021; 37:4517-4525. [PMID: 34180989 PMCID: PMC8652027 DOI: 10.1093/bioinformatics/btab467] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/28/2021] [Accepted: 06/25/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation B-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immuno-diagnostic reagents and antibody production, and thus in infectious disease prevention and diagnostics in general. Experimental methods used to determine BCEs are costly and time-consuming. Therefore, it is essential to develop computational methods for the rapid identification of BCEs. Although several computational methods have been developed for this task, generalizability is still a major concern, where cross-testing of the classifiers trained and tested on different datasets has revealed accuracies of 51–53%. Results We describe a new method called EpitopeVec, which uses a combination of residue properties, modified antigenicity scales, and protein language model-based representations (protein vectors) as features of peptides for linear BCE predictions. Extensive benchmarking of EpitopeVec and other state-of-the-art methods for linear BCE prediction on several large and small datasets, as well as cross-testing, demonstrated an improvement in the performance of EpitopeVec over other methods in terms of accuracy and area under the curve. As the predictive performance depended on the species origin of the respective antigens (viral, bacterial and eukaryotic), we also trained our method on a large viral dataset to create a dedicated linear viral BCE predictor with improved cross-testing performance. Availability and implementation The software is available at https://github.com/hzi-bifo/epitope-prediction. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akash Bahai
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124 Braunschweig, Germany.,Braunschweig Integrated Center of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig
| | - Ehsaneddin Asgari
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124 Braunschweig, Germany.,Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, 94720, USA
| | - Mohammad R K Mofrad
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, 94720, USA.,Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Lab, Berkeley, CA 94720, USA
| | - Andreas Kloetgen
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124 Braunschweig, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124 Braunschweig, Germany.,Braunschweig Integrated Center of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig
| |
Collapse
|
23
|
Zhang H, Bei Z, Xi W, Hao M, Ju Z, Saravanan KM, Zhang H, Guo N, Wei Y. Evaluation of residue-residue contact prediction methods: From retrospective to prospective. PLoS Comput Biol 2021; 17:e1009027. [PMID: 34029314 PMCID: PMC8177648 DOI: 10.1371/journal.pcbi.1009027] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 06/04/2021] [Accepted: 04/28/2021] [Indexed: 12/31/2022] Open
Abstract
Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized. The amino acid sequence of a protein ultimately determines its tertiary structure, and the tertiary structure determines its function(s) and plays a key role in understanding biological processes and disease pathogenesis. Protein tertiary structure can be determined using experimental techniques such as cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, which are very expensive and time-consuming. As an alternative, researchers are trying to use in silico methods to predict the 3D structures. Residue contact-assisted protein folding paves an avenue for sequence-based protein structure prediction and therefore has become one of the most challenging and promising problems in structural bioinformatics. Over the past years, contact prediction has undergone continuous evolution in techniques. Through a retrospective analysis of traditional machine learning /evolutionary coupling analysis methods/ consensus machine learning methods and a multi-perspective study on recently developed deep learning methods, we explore the most advanced contact predictors, pursue application scenarios for different methods, and seek prospective directions for further improvement. We anticipate that our study will serve as a practical and useful guide for the development of future approaches to contact prediction.
Collapse
Affiliation(s)
- Huiling Zhang
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zhendong Bei
- Cloud Computing Department, Alibaba Group, Hangzhou, China
| | - Wenhui Xi
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Zhen Ju
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Konda Mani Saravanan
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Haiping Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ning Guo
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- * E-mail:
| |
Collapse
|
24
|
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol 2021; 17:e1008865. [PMID: 33770072 PMCID: PMC8026059 DOI: 10.1371/journal.pcbi.1008865] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/07/2021] [Accepted: 03/10/2021] [Indexed: 12/24/2022] Open
Abstract
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library. Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate competitive performance of the proposed methods to other top approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training.
Collapse
|
25
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
26
|
Sun D, Gong X. Tetramer protein complex interface residue pairs prediction with LSTM combined with graph representations. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140504. [PMID: 32717382 DOI: 10.1016/j.bbapap.2020.140504] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 06/30/2020] [Accepted: 07/16/2020] [Indexed: 10/23/2022]
Abstract
MOTIVATION Protein-protein interactions are important for many biological processes. Theoretical understanding of the structurally determining factors of interaction sites will help to understand the underlying mechanism of protein-protein interactions. Taking advantage of advanced mathematical methods to correctly predict interaction sites will be useful. Although some previous studies have been devoted to the interaction interface of protein monomer and the interface residues between chains of protein dimers, very few studies about the interface residues prediction of protein multimers, including trimers, tetramer and even more monomers in a large protein complex. As we all know, a large number of proteins function with the form of multibody protein complexes. And the complexity of the protein multimers structure causes the difficulty of interface residues prediction on them. So, we hope to build a method for the prediction of protein tetramer interface residue pairs. RESULTS Here, we developed a new deep network based on LSTM network combining with graph to predict protein tetramers interaction interface residue pairs. On account of the protein structure data is not the same as the image or video data which is well-arranged matrices, namely the Euclidean Structure mentioned in many researches. Because the Non-Euclidean Structure data can't keep the translation invariance, and we hope to extract some spatial features from this kind of data applying on deep learning, an algorithm combining with graph was developed to predict the interface residue pairs of protein interactions based on a topological graph building a relationship between vertexes and edges in graph theory combining multilayer Long Short-Term Memory network. First, selecting the training and test samples from the Protein Data Bank, and then extracting the physicochemical property features and the geometric features of surface residue associated with interfacial properties. Subsequently, we transform the protein multimers data to topological graphs and predict protein interaction interface residue pairs using the model. In addition, different types of evaluation indicators verified its validity.
Collapse
Affiliation(s)
- Daiwen Sun
- Mathematics Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, PR China
| | - Xinqi Gong
- Mathematics Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, PR China; Beijing Advanced Innovation Center for Structural Biology, Tsinghua Univeristy, Beijing 100091, PR China.
| |
Collapse
|
27
|
Sun J, Frishman D. DeepHelicon: Accurate prediction of inter-helical residue contacts in transmembrane proteins by residual neural networks. J Struct Biol 2020; 212:107574. [PMID: 32663598 DOI: 10.1016/j.jsb.2020.107574] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/03/2020] [Accepted: 07/07/2020] [Indexed: 01/16/2023]
Abstract
Accurate prediction of amino acid residue contacts is an important prerequisite for generating high-quality 3D models of transmembrane (TM) proteins. While a large number of compositional, evolutionary, and structural properties of proteins can be used to train contact prediction methods, recent research suggests that coevolution between residues provides the strongest indication of their spatial proximity. We have developed a deep learning approach, DeepHelicon, to predict inter-helical residue contacts in TM proteins by considering only coevolutionary features. DeepHelicon comprises a two-stage supervised learning process by residual neural networks for a gradual refinement of contact maps, followed by variance reduction by an ensemble of models. We present a benchmark study of 12 contact predictors and conclude that DeepHelicon together with the two other state-of-the-art methods DeepMetaPSICOV and Membrain2 outperforms the 10 remaining algorithms on all datasets and at all settings. On a set of 44 TM proteins with an average length of 388 residues DeepHelicon achieves the best performance among all benchmarked methods in predicting the top L/5 and L/2 inter-helical contacts, with the mean precision of 87.42% and 77.84%, respectively. On a set of 57 relatively small TM proteins with an average length of 298 residues DeepHelicon ranks second best after DeepMetaPSICOV. DeepHelicon produces the most accurate predictions for large proteins with more than 10 transmembrane helices. Coevolutionary features alone allow to predict inter-helical residue contacts with an accuracy sufficient for generating acceptable 3D models for up to 30% of proteins using a fully automated modeling method such as CONFOLD2.
Collapse
Affiliation(s)
- Jianfeng Sun
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, 85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, Technische Universität München, 85354 Freising, Germany.
| |
Collapse
|
28
|
Feng J, Shukla D. FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution. J Phys Chem B 2020; 124:3605-3615. [PMID: 32283936 DOI: 10.1021/acs.jpcb.9b11869] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e., spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.
Collapse
|
29
|
Chen MC, Li Y, Zhu YH, Ge F, Yu DJ. SSCpred: Single-Sequence-Based Protein Contact Prediction Using Deep Fully Convolutional Network. J Chem Inf Model 2020; 60:3295-3303. [DOI: 10.1021/acs.jcim.9b01207] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ming-Cai Chen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Washtenaw 100, Ann Arbor, Michigan 48109-2218, United States
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| |
Collapse
|
30
|
Lemke T, Berg A, Jain A, Peter C. EncoderMap(II): Visualizing Important Molecular Motions with Improved Generation of Protein Conformations. J Chem Inf Model 2019; 59:4550-4560. [PMID: 31647645 DOI: 10.1021/acs.jcim.9b00675] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Dimensionality reduction can be used to project high-dimensional molecular data into a simplified, low-dimensional map. One feature of our recently introduced dimensionality reduction technique EncoderMap, which relies on the combination of an autoencoder with multidimensional scaling, is its ability to do the reverse. It is able to generate conformations for any selected points in the low-dimensional map. This transfers the simplified, low-dimensional map back into the high-dimensional conformational space. Although the output is again high-dimensional, certain aspects of the simplification are preserved. The generated conformations only mirror the most dominant conformational differences that determine the positions of conformational states in the low-dimensional map. This allows depicting such differences and-in consequence-visualizing molecular motions and gives a unique perspective on high-dimensional conformational data. In our previous work, protein conformations described in backbone dihedral angle space were used as the input for EncoderMap, and conformations were also generated in this space. For large proteins, however, the generation of conformations is inaccurate with this approach due to the local character of backbone dihedral angles. Here, we present an improved variant of EncoderMap which is able to generate large protein conformations that are accurate in short-range and long-range orders. This is achieved by differentiable reconstruction of Cartesian coordinates from the generated dihedrals, which allows adding a contribution to the cost function that monitors the accuracy of all pairwise distances between the Cα-atoms of the generated conformations. The improved capabilities to generate conformations of large, even multidomain, proteins are demonstrated for two examples: diubiquitin and a part of the Ssa1 Hsp70 yeast chaperone. We show that the improved variant of EncoderMap can nicely visualize motions of protein domains relative to each other but is also able to highlight important conformational changes within the individual domains.
Collapse
Affiliation(s)
- Tobias Lemke
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| | - Andrej Berg
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| | - Alok Jain
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany.,Department of Biotechnology , National Institute of Pharmaceutical Education and Research Ahmedabad , Gandhinagar , Gujarat 382355 , India
| | - Christine Peter
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| |
Collapse
|
31
|
Hanson J, Paliwal K, Litfin T, Yang Y, Zhou Y. Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks. Bioinformatics 2019; 34:4039-4045. [PMID: 29931279 DOI: 10.1093/bioinformatics/bty481] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 06/13/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Accurate prediction of a protein contact map depends greatly on capturing as much contextual information as possible from surrounding residues for a target residue pair. Recently, ultra-deep residual convolutional networks were found to be state-of-the-art in the latest Critical Assessment of Structure Prediction techniques (CASP12) for protein contact map prediction by attempting to provide a protein-wide context at each residue pair. Recurrent neural networks have seen great success in recent protein residue classification problems due to their ability to propagate information through long protein sequences, especially Long Short-Term Memory (LSTM) cells. Here, we propose a novel protein contact map prediction method by stacking residual convolutional networks with two-dimensional residual bidirectional recurrent LSTM networks, and using both one-dimensional sequence-based and two-dimensional evolutionary coupling-based information. Results We show that the proposed method achieves a robust performance over validation and independent test sets with the Area Under the receiver operating characteristic Curve (AUC) > 0.95 in all tests. When compared to several state-of-the-art methods for independent testing of 228 proteins, the method yields an AUC value of 0.958, whereas the next-best method obtains an AUC of 0.909. More importantly, the improvement is over contacts at all sequence-position separations. Specifically, a 8.95%, 5.65% and 2.84% increase in precision were observed for the top L∕10 predictions over the next best for short, medium and long-range contacts, respectively. This confirms the usefulness of ResNets to congregate the short-range relations and 2D-BRLSTM to propagate the long-range dependencies throughout the entire protein contact map 'image'. Availability and implementation SPOT-Contact server url: http://sparks-lab.org/jack/server/SPOT-Contact/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
- School of Data and Computer Science, Sun-Yat Sen University, Guangzhou, Guangdong, China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| |
Collapse
|
32
|
Zheng W, Wuyun Q, Li Y, Mortuza SM, Zhang C, Pearce R, Ruan J, Zhang Y. Detecting distant-homology protein structures by aligning deep neural-network based contact maps. PLoS Comput Biol 2019; 15:e1007411. [PMID: 31622328 PMCID: PMC6818797 DOI: 10.1371/journal.pcbi.1007411] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 10/29/2019] [Accepted: 09/21/2019] [Indexed: 12/31/2022] Open
Abstract
Accurate prediction of atomic-level protein structure is important for annotating the biological functions of protein molecules and for designing new compounds to regulate the functions. Template-based modeling (TBM), which aims to construct structural models by copying and refining the structural frameworks of other known proteins, remains the most accurate method for protein structure prediction. Due to the difficulty in recognizing distant-homology templates, however, the accuracy of TBM decreases rapidly when the evolutionary relationship between the query and template vanishes. In this study, we propose a new method, CEthreader, which first predicts residue-residue contacts by coupling evolutionary precision matrices with deep residual convolutional neural-networks. The predicted contact maps are then integrated with sequence profile alignments to recognize structural templates from the PDB. The method was tested on two independent benchmark sets consisting collectively of 1,153 non-homologous protein targets, where CEthreader detected 176% or 36% more correct templates with a TM-score >0.5 than the best state-of-the-art profile- or contact-based threading methods, respectively, for the Hard targets that lacked homologous templates. Moreover, CEthreader was able to identify 114% or 20% more correct templates with the same Fold as the query, after excluding structures from the same SCOPe Superfamily, than the best profile- or contact-based threading methods. Detailed analyses show that the major advantage of CEthreader lies in the efficient coupling of contact maps with profile alignments, which helps recognize global fold of protein structures when the homologous relationship between the query and template is weak. These results demonstrate an efficient new strategy to combine ab initio contact map prediction with profile alignments to significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins. Despite decades of effort in computational method development, template-based modeling (TBM) still remains the most reliable approach to high-resolution protein structure prediction. Previous studies have shown that the PDB library is complete for single-domain proteins and TBM is in principle sufficient to solve the structure prediction problem if the most similar structure in the PDB could be reliably identified and used as template for model reconstruction. But in reality, the success of TBM depends on the availability of closely-homologous templates, where its accuracy and reliability decrease sharply when the evolutionary relationship between query and template becomes more distant. We developed a new threading approach, CEthreader, which allows for dynamic programing alignments of predicted contact-maps through eigen-decomposition. The large-scale benchmark tests show that the coupling of contact map with profile and secondary structure alignments through the proposed protocol can significantly improve the accuracy of template recognition for distantly-homologous protein targets.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
- College of Mathematical Sciences and LPMC, Nankai University, Tianjin, PR China
| | - Qiqige Wuyun
- College of Mathematical Sciences and LPMC, Nankai University, Tianjin, PR China
- Computer Science and Engineering Department, Michigan State University, East Lansing, MI, United States of America
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - S. M. Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - Jishou Ruan
- College of Mathematical Sciences and LPMC, Nankai University, Tianjin, PR China
- State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, PR China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, United States of America
- * E-mail:
| |
Collapse
|
33
|
Zheng W, Zhang C, Bell EW, Zhang Y. I-TASSER gateway: A protein structure and function prediction server powered by XSEDE. FUTURE GENERATIONS COMPUTER SYSTEMS : FGCS 2019; 99:73-85. [PMID: 31427836 PMCID: PMC6699767 DOI: 10.1016/j.future.2019.04.011] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
There is an increasing gap between the number of known protein sequences and the number of proteins with experimentally characterized structure and function. To alleviate this issue, we have developed the I-TASSER gateway, an online server for automated and reliable protein structure and function prediction. For a given sequence, I-TASSER starts with template recognition from a known structure library, followed by full-length atomic model construction by iterative assembly simulations of the continuous structural fragments excised from the template alignments. Functional insights are then derived from comparative matching of the predicted model with a library of proteins with known function. The I-TASSER pipeline has been recently integrated with the XSEDE Gateway system to accommodate pressing demand from the user community and increasing computing costs. This report summarizes the configuration of the I-TASSER Gateway with the XSEDE-Comet supercomputer cluster, together with an overview of the I-TASSER method and milestones of its development.
Collapse
|
34
|
Wu Q, Peng Z, Anishchenko I, Cong Q, Baker D, Yang J. Protein contact prediction using metagenome sequence data and residual neural networks. Bioinformatics 2019; 36:41-48. [PMID: 31173061 PMCID: PMC8792440 DOI: 10.1093/bioinformatics/btz477] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 05/30/2019] [Accepted: 06/04/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Almost all protein residue contact prediction methods rely on the availability of deep multiple sequence alignments (MSAs). However, many proteins from the poorly populated families do not have sufficient number of homologs in the conventional UniProt database. Here we aim to solve this issue by exploring the rich sequence data from the metagenome sequencing projects. RESULTS Based on the improved MSA constructed from the metagenome sequence data, we developed MapPred, a new deep learning-based contact prediction method. MapPred consists of two component methods, DeepMSA and DeepMeta, both trained with the residual neural networks. DeepMSA was inspired by the recent method DeepCov, which was trained on 441 matrices of covariance features. By considering the symmetry of contact map, we reduced the number of matrices to 231, which makes the training more efficient in DeepMSA. Experiments show that DeepMSA outperforms DeepCov by 10-13% in precision. DeepMeta works by combining predicted contacts and other sequence profile features. Experiments on three benchmark datasets suggest that the contribution from the metagenome sequence data is significant with P-values less than 4.04E-17. MapPred is shown to be complementary and comparable the state-of-the-art methods. The success of MapPred is attributed to three factors: the deeper MSA from the metagenome sequence data, improved feature design in DeepMSA and optimized training by the residual neural networks. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/mappred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qi Wu
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- To whom correspondence should be addressed. E-mail: or
| | - Ivan Anishchenko
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Qian Cong
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - David Baker
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Jianyi Yang
- To whom correspondence should be addressed. E-mail: or
| |
Collapse
|
35
|
El Hefnawi MM, Hasan ME, Mahmoud A, Khidr YA, El Behaidy WH, El-Absawy ESA, Hemeida AA. Prediction and Analysis of Three-Dimensional Structure of the p7- Transactivated Protein1 of Hepatitis C Virus. Infect Disord Drug Targets 2019; 19:55-66. [PMID: 29243584 DOI: 10.2174/1871526518666171215123214] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 06/07/2017] [Accepted: 06/11/2017] [Indexed: 01/06/2023]
Abstract
BACKGROUND The p7-transactivated protein1 of Hepatitis C virus is a small integral membrane protein of 127 amino acids, which is crucial for assembly and release of infectious virions. Ab initio or comparative modelling, is an essential tool to solve the problem of protein structure prediction and to comprehend the physicochemical fundamental of how proteins fold in nature. RESULTS Only one domain (1-127) of p7-transactivated protein1 has been predicted using the systematic in silico approach, ThreaDom. I-TASSER was ranked as the best server for full-length 3-D protein structural predictions of p7-transactivated protein1 where the benchmarked scoring system such as C-score, TM-score, RMSD and Z-score are used to obtain quantitative assessments of the I-TASSER models. Scanning protein motif databases, along with secondary and surface accessibility predictions integrated with post translational modification sites (PTMs) prediction revealed functional and protein binding motifs. Three protein binding motifs (two Asp/Glutamnse, CTNNB1- bd_N) with high sequence conservation and two PTMs prediction: Camp_phospho_site and Myristyl site were predicted using BLOCKS and PROSITE scan. These motifs and PTMs were related to the function of p7-transactivated protein1 protein in inducing ion channel/pore and release of infectious virions. Using SCOP, only one hit matched protein sequence at 71-120 was classified as small proteins and FYVE/PHD zinc finger superfamily. CONCLUSION Integrating this information about the p7-transactivated protein1 with SCOP and CATH annotations of the templates facilitates the assignment of structure-function/ evolution relationships to the known and the newly determined protein structures.
Collapse
Affiliation(s)
- Mahmoud M El Hefnawi
- Informatics and Systems Department, Division of Engineering Research Sciences, the National Research Centre, Giza, Egypt
| | - Mohamed E Hasan
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt
| | - Amal Mahmoud
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt.,Department of Biology, College of Science, Imam Abdulrahman Bin Faisal University, Damam, Saudi Arabia
| | - Yehia A Khidr
- Plant Biotechnology Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt
| | | | - El-Sayed A El-Absawy
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt
| | - Alaa A Hemeida
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt
| |
Collapse
|
36
|
Malik A, Afaq S, Gamal BE, Ellatif MA, Hassan WN, Dera A, Noor R, Tarique M. Molecular docking and pharmacokinetic evaluation of natural compounds as targeted inhibitors against Crz1 protein in Rhizoctonia solani. Bioinformation 2019; 15:277-286. [PMID: 31285645 PMCID: PMC6599437 DOI: 10.6026/97320630015277] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 03/27/2019] [Indexed: 11/29/2022] Open
Abstract
Crz1p regulates Calcineurin, a serine-threonine-specific protein phosphatase, in Rhizoctonia solani. It has attracted consideration as a novel target of antifungal therapy based on studies in numerous pathogenic fungi, including, Cryptococcus neoformans, Candida albicans and Aspergillus fumigatus. To investigate whether Calcineurin can be a useful target for the treatment of Crz1 protein in R. solani causing wet root rot in Chickpea. The work presented here reports the in-silico studies of Crz1 protein against natural compounds. This study Comprises of quantitative structure-toxicity relationship (QSTR) and quantitative structure-activity relationship (QSAR). All compounds showed high binding energy for Crz1 protein through molecular docking. Further, a pharmacokinetic study revealed that these compounds had minimal side effects. Biological activity spectrum prediction of these compounds showed potential antifungal properties by showing significant interaction with Crz1. Hence, these compounds can be used for the prevention and treatment of wet root rot in Chickpea.
Collapse
Affiliation(s)
- Ajit Malik
- Department of Clinical Biochemistry, College of Medicine, King Khalid University, Abha, Saudi Arabia
| | - Sarah Afaq
- Department of Clinical Biochemistry, College of Medicine, King Khalid University, Abha, Saudi Arabia
| | - Basiouny El Gamal
- Department of Clinical Biochemistry, College of Medicine, King Khalid University, Abha, Saudi Arabia
| | - Mohamed Abd Ellatif
- Department of Clinical Biochemistry, College of Medicine, King Khalid University, Abha, Saudi Arabia
- Department of Medical Biochemistry,Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Waleed N Hassan
- Department of Clinical Biochemistry, College of Medicine, King Khalid University, Abha, Saudi Arabia
| | - Ayed Dera
- Departments of Clinical Laboratory Science, College of Applied MedicalScience, King Khalid University, Abha, Saudi Arabia
| | - Rana Noor
- 5Department of Biochemistry, Faculty of Dentistry, Jamia Millia Islamia, New Delhi-110025, India
| | - Mohammed Tarique
- Center for InterdisciplinaryResearch in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi-110025, India
| |
Collapse
|
37
|
Alwabli AS, Alattas SG, Alhebshi AM, Zabermawi NM, Alkenani N, ghmady KA, Qadri I. Molecular docking analysis of netropsin and novobiocin with the viral protein targets HABD, MTD and RCD. Bioinformation 2019; 15:233-239. [PMID: 31285639 PMCID: PMC6599439 DOI: 10.6026/97320630015233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Revised: 03/03/2019] [Accepted: 03/09/2019] [Indexed: 12/02/2022] Open
Abstract
Dengue, West Nile and Zika virus belongs to the family flaviviridae and genus flavivirus. It is of interest to design and develop inhibitors with improved activity against these diseases. We used the helicases target to screen for potential inhibitors against these viruses using molecular docking analysis. NS3 helicases of flavivirus family of viruses such as Dengue, West Nile and Zika are prime targets for drug development. The computer aided molecular docking analysis of netropsin and novobiocin with the viral protein targets HABD, MTD and RCD is reported for further consideration.
Collapse
Affiliation(s)
- Afaf S Alwabli
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Sana G Alattas
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Alawiah M Alhebshi
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Nidal M Zabermawi
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Naser Alkenani
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Khalid Al ghmady
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Ishtiaq Qadri
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
38
|
Luttrell J, Liu T, Zhang C, Wang Z. Predicting protein residue-residue contacts using random forests and deep networks. BMC Bioinformatics 2019; 20:100. [PMID: 30871477 PMCID: PMC6419322 DOI: 10.1186/s12859-019-2627-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ability to predict which pairs of amino acid residues in a protein are in contact with each other offers many advantages for various areas of research that focus on proteins. For example, contact prediction can be used to reduce the computational complexity of predicting the structure of proteins and even to help identify functionally important regions of proteins. These predictions are becoming especially important given the relatively low number of experimentally determined protein structures compared to the amount of available protein sequence data. RESULTS Here we have developed and benchmarked a set of machine learning methods for performing residue-residue contact prediction, including random forests, direct-coupling analysis, support vector machines, and deep networks (stacked denoising autoencoders). These methods are able to predict contacting residue pairs given only the amino acid sequence of a protein. According to our own evaluations performed at a resolution of +/- two residues, the predictors we trained with the random forest algorithm were our top performing methods with average top 10 prediction accuracy scores of 85.13% (short range), 74.49% (medium range), and 54.49% (long range). Our ensemble models (stacked denoising autoencoders combined with support vector machines) were our best performing deep network predictors and achieved top 10 prediction accuracy scores of 75.51% (short range), 60.26% (medium range), and 43.85% (long range) using the same evaluation. These tests were blindly performed on targets from the CASP11 dataset; and the results suggested that our models achieved comparable performance to contact predictors developed by groups that participated in CASP11. CONCLUSIONS Due to the challenging nature of contact prediction, it is beneficial to develop and benchmark a variety of different prediction methods. Our work has produced useful tools with a simple interface that can provide contact predictions to users without requiring a lengthy installation process. In addition to this, we have released our C++ implementation of the direct-coupling analysis method as a standalone software package. Both this tool and our RFcon web server are freely available to the public at http://dna.cs.miami.edu/RFcon /.
Collapse
Affiliation(s)
- Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA
| | - Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124, USA.
| |
Collapse
|
39
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
40
|
DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep 2019; 9:3514. [PMID: 30837676 PMCID: PMC6401133 DOI: 10.1038/s41598-019-40314-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022] Open
Abstract
The amino acid sequence of a protein encodes the blueprint of its native structure. To predict the corresponding structural fold from the protein’s sequence is one of most challenging problems in computational biology. In this work, we introduce DESTINI (deep structural inference for proteins), a novel computational approach that combines a deep-learning algorithm for protein residue/residue contact prediction with template-based structural modelling. For the first time, the significantly improved predictive ability is demonstrated in the large-scale tertiary structure prediction of over 1,200 single-domain proteins. DESTINI successfully predicts the tertiary structure of four times the number of “hard” targets (those with poor quality templates) that were previously intractable, viz, a “glass-ceiling” for previous template-based approaches, and also improves model quality for “easy” targets (those with good quality templates). The significantly better performance by DESTINI is largely due to the incorporation of better contact prediction into template modelling. To understand why deep-learning accomplishes more accurate contact prediction, systematic clustering reveals that deep-learning predicts coherent, native-like contact patterns compared to co-evolutionary analysis. Taken together, this work presents a promising strategy towards solving the protein structure prediction problem.
Collapse
|
41
|
Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 2019; 19:219-230. [PMID: 27802931 DOI: 10.1093/bib/bbw106] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 11/14/2022] Open
Abstract
Sequence-based prediction of residue-residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure. We found that: (1) the machine learning-based methods outperform the direct-coupling-based methods for short-range contact prediction, while the latter are significantly better for long-range contact prediction. The consensus-based methods, which combine machine learning and direct-coupling methods, perform the best. (2) The target difficulty does not have clear influence on the machine learning-based methods, while it does affect the direct-coupling and consensus-based methods significantly. (3) The alignment depth has relatively weak effect on the machine learning-based methods. However, for the direct-coupling-based methods and consensus-based methods, the predicted contacts for targets with deeper alignment tend to be more accurate. (4) All methods perform relatively better on β and α + β proteins than on α proteins. (5) Residues buried in the core of protein structure are more prone to be in contact than residues on the surface (22 versus 6%). We believe these are useful results for guiding future development of new approach to contact prediction.
Collapse
Affiliation(s)
- Qiqige Wuyun
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Wei Zheng
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| |
Collapse
|
42
|
Ding W, Mao W, Shao D, Zhang W, Gong H. DeepConPred2: An Improved Method for the Prediction of Protein Residue Contacts. Comput Struct Biotechnol J 2018; 16:503-510. [PMID: 30505403 PMCID: PMC6247404 DOI: 10.1016/j.csbj.2018.10.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/16/2018] [Accepted: 10/18/2018] [Indexed: 12/18/2022] Open
Abstract
Information of residue-residue contacts is essential for understanding the mechanism of protein folding, and has been successfully applied as special topological restraints to simplify the conformational sampling in de novo protein structure prediction. Prediction of protein residue contacts has experienced amazingly rapid progresses recently, with prediction accuracy approaching impressively high levels in the past two years. In this work, we introduce a second version of our residue contact predictor, DeepConPred2, which exhibits substantially improved performance and sufficiently reduced running time after model re-optimization and feature updates. When testing on the CASP12 free modeling targets, our program reaches at least the same level of prediction accuracy as the best contact predictors so far and provides information complementary to other state-of-the-art methods in contact-assisted folding.
Collapse
Affiliation(s)
- Wenze Ding
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Di Shao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Wenxuan Zhang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
43
|
Kc DB. Recent advances in sequence-based protein structure prediction. Brief Bioinform 2018; 18:1021-1032. [PMID: 27562963 DOI: 10.1093/bib/bbw070] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Indexed: 11/13/2022] Open
Abstract
The most accurate characterizations of the structure of proteins are provided by structural biology experiments. However, because of the high cost and labor-intensive nature of the structural experiments, the gap between the number of protein sequences and solved structures is widening rapidly. Development of computational methods to accurately model protein structures from sequences is becoming increasingly important to the biological community. In this article, we highlight some important progress in the field of protein structure prediction, especially those related to free modeling (FM) methods that generate structure models without using homologous templates. We also provide a short synopsis of some of the recent advances in FM approaches as demonstrated in the recent Computational Assessment of Structure Prediction competition as well as recent trends and outlook for FM approaches in protein structure prediction.
Collapse
|
44
|
Correa L, Borguesan B, Farfan C, Inostroza-Ponta M, Dorn M. A Memetic Algorithm for 3-D Protein Structure Prediction Problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:690-704. [PMID: 27925594 DOI: 10.1109/tcbb.2016.2635143] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Memetic Algorithms are population-based metaheuristics intrinsically concerned with exploiting all available knowledge about the problem under study. The incorporation of problem domain knowledge is not an optional mechanism, but a fundamental feature of the Memetic Algorithms. In this paper, we present a Memetic Algorithm to tackle the three-dimensional protein structure prediction problem. The method uses a structured population and incorporates a Simulated Annealing algorithm as a local search strategy, as well as ad-hoc crossover and mutation operators to deal with the problem. It takes advantage of structural knowledge stored in the Protein Data Bank, by using an Angle Probability List that helps to reduce the search space and to guide the search strategy. The proposed algorithm was tested on nineteen protein sequences of amino acid residues, and the results show the ability of the algorithm to find native-like protein structures. Experimental results have revealed that the proposed algorithm can find good solutions regarding root-mean-square deviation and global distance total score test in comparison with the experimental protein structures. We also show that our results are comparable in terms of folding organization with state-of-the-art prediction methods, corroborating the effectiveness of our proposal.
Collapse
|
45
|
He B, Mortuza SM, Wang Y, Shen HB, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics 2018; 33:2296-2306. [PMID: 28369334 DOI: 10.1093/bioinformatics/btx164] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/21/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation Recent CASP experiments have witnessed exciting progress on folding large-size non-humongous proteins with the assistance of co-evolution based contact predictions. The success is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. Development of efficient methods that can generate balanced and reliable contact maps for different type of protein targets is essential to enhance the success rate of the ab initio protein structure prediction. Results We developed a new pipeline, NeBcon, which uses the naïve Bayes classifier (NBC) theorem to combine eight state of the art contact methods that are built from co-evolution and machine learning approaches. The posterior probabilities of the NBC model are then trained with intrinsic structural features through neural network learning for the final contact map prediction. NeBcon was tested on 98 non-redundant proteins, which improves the accuracy of the best co-evolution based meta-server predictor by 22%; the magnitude of the improvement increases to 45% for the hard targets that lack sequence and structural homologs in the databases. Detailed data analysis showed that the major contribution to the improvement is due to the optimized NBC combination of the complementary information from both co-evolution and machine learning predictions. The neural network training also helps to improve the coupling of the NBC posterior probability and the intrinsic structural features, which were found particularly important for the proteins that do not have sufficient number of homologous sequences to derive reliable co-evolution profiles. Availiablity and Implementation On-line server and standalone package of the program are available at http://zhanglab.ccmb.med.umich.edu/NeBcon/ . Contact zhng@umich.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoji He
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.,School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.,School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
46
|
Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 2018; 86 Suppl 1:51-66. [PMID: 29071738 PMCID: PMC5820169 DOI: 10.1002/prot.25407] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022]
Abstract
Following up on the encouraging results of residue-residue contact prediction in the CASP11 experiment, we present the analysis of predictions submitted for CASP12. The submissions include predictions of 34 groups for 38 domains classified as free modeling targets which are not accessible to homology-based modeling due to a lack of structural templates. CASP11 saw a rise of coevolution-based methods outperforming other approaches. The improvement of these methods coupled to machine learning and sequence database growth are most likely the main driver for a significant improvement in average precision from 27% in CASP11 to 47% in CASP12. In more than half of the targets, especially those with many homologous sequences accessible, precisions above 90% were achieved with the best predictors reaching a precision of 100% in some cases. We furthermore tested the impact of using these contacts as restraints in ab initio modeling of 14 single-domain free modeling targets using Rosetta. Adding contacts to the Rosetta calculations resulted in improvements of up to 26% in GDT_TS within the top five structures.
Collapse
Affiliation(s)
- Joerg Schaarschmidt
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| | | | | | - Alexandre M.J.J. Bonvin
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| |
Collapse
|
47
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
48
|
Mandalaparthy V, Sanaboyana VR, Rafalia H, Gosavi S. Exploring the effects of sparse restraints on protein structure prediction. Proteins 2017; 86:248-262. [DOI: 10.1002/prot.25438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 11/20/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023]
Affiliation(s)
- Varun Mandalaparthy
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| | - Venkata Ramana Sanaboyana
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| | - Hitesh Rafalia
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
- Manipal University, Madhav Nagar; Manipal 576104 India
| | - Shachi Gosavi
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| |
Collapse
|
49
|
Zhang C, Mortuza SM, He B, Wang Y, Zhang Y. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 2017; 86 Suppl 1:136-151. [PMID: 29082551 DOI: 10.1002/prot.25414] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/09/2017] [Accepted: 10/27/2017] [Indexed: 12/26/2022]
Abstract
We develop two complementary pipelines, "Zhang-Server" and "QUARK", based on I-TASSER and QUARK pipelines for template-based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I-TASSER and QUARK successfully folds three medium-size FM targets that have more than 150 residues, even though the interplay between the two pipelines still awaits further optimization. Newly developed sequence-based contact prediction by NeBcon plays a critical role to enhance the quality of models, particularly for FM targets, by the new pipelines. The inclusion of NeBcon predicted contacts as restraints in the QUARK simulations results in an average TM-score of 0.41 for the best in top five predicted models, which is 37% higher than that by the QUARK simulations without contacts. In particular, there are seven targets that are converted from non-foldable to foldable (TM-score >0.5) due to the use of contact restraints in the simulations. Another additional feature in the current pipelines is the local structure quality prediction by ResQ, which provides a robust residue-level modeling error estimation. Despite the success, significant challenges still remain in ab initio modeling of multi-domain proteins and folding of β-proteins with complicated topologies bound by long-range strand-strand interactions. Improvements on domain boundary and long-range contact prediction, as well as optimal use of the predicted contacts and multiple threading alignments, are critical to address these issues seen in the CASP12 experiment.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
50
|
Wang S, Li Z, Yu Y, Xu J. Folding Membrane Proteins by Deep Transfer Learning. Cell Syst 2017; 5:202-211.e3. [PMID: 28957654 PMCID: PMC5637520 DOI: 10.1016/j.cels.2017.09.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 06/01/2017] [Accepted: 08/29/2017] [Indexed: 01/02/2023]
Abstract
Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here, we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-MPs and then predicts 3D structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs, and generates 3D models with root-mean-square deviation (RMSD) less than 4 and 5 Å for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation project shows that our method predicted high-resolution 3D models for two recent test MPs of 210 residues with RMSD ∼2 Å. We estimated that our method could predict correct folds for 1,345-1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at MPs.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA; Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Zhen Li
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Computer Science, University of Hong Kong, Hong Kong
| | - Yizhou Yu
- Department of Computer Science, University of Hong Kong, Hong Kong
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
| |
Collapse
|