1
|
Chen L, Li Q, Nasif KFA, Xie Y, Deng B, Niu S, Pouriyeh S, Dai Z, Chen J, Xie CY. AI-Driven Deep Learning Techniques in Protein Structure Prediction. Int J Mol Sci 2024; 25:8426. [PMID: 39125995 PMCID: PMC11313475 DOI: 10.3390/ijms25158426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 07/29/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Protein structure prediction is important for understanding their function and behavior. This review study presents a comprehensive review of the computational models used in predicting protein structure. It covers the progression from established protein modeling to state-of-the-art artificial intelligence (AI) frameworks. The paper will start with a brief introduction to protein structures, protein modeling, and AI. The section on established protein modeling will discuss homology modeling, ab initio modeling, and threading. The next section is deep learning-based models. It introduces some state-of-the-art AI models, such as AlphaFold (AlphaFold, AlphaFold2, AlphaFold3), RoseTTAFold, ProteinBERT, etc. This section also discusses how AI techniques have been integrated into established frameworks like Swiss-Model, Rosetta, and I-TASSER. The model performance is compared using the rankings of CASP14 (Critical Assessment of Structure Prediction) and CASP15. CASP16 is ongoing, and its results are not included in this review. Continuous Automated Model EvaluatiOn (CAMEO) complements the biennial CASP experiment. Template modeling score (TM-score), global distance test total score (GDT_TS), and Local Distance Difference Test (lDDT) score are discussed too. This paper then acknowledges the ongoing difficulties in predicting protein structure and emphasizes the necessity of additional searches like dynamic protein behavior, conformational changes, and protein-protein interactions. In the application section, this paper introduces some applications in various fields like drug design, industry, education, and novel protein development. In summary, this paper provides a comprehensive overview of the latest advancements in established protein modeling and deep learning-based models for protein structure predictions. It emphasizes the significant advancements achieved by AI and identifies potential areas for further investigation.
Collapse
Affiliation(s)
- Lingtao Chen
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Qiaomu Li
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Kazi Fahim Ahmad Nasif
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Ying Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Bobin Deng
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Shuteng Niu
- Department of Computer Science, Bowling Green State University, Bowling Green, OH 43403, USA;
| | - Seyedamin Pouriyeh
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Zhiyu Dai
- Division of Pulmonary and Critical Care Medicine, John T. Milliken Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA;
| | - Jiawei Chen
- College of Computing, Data Science and Society, University of California, Berkeley, CA 94720, USA;
| | - Chloe Yixin Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| |
Collapse
|
2
|
Williams ME. HIV-1 Vif protein sequence variations in South African people living with HIV and their influence on Vif-APOBEC3G interaction. Eur J Clin Microbiol Infect Dis 2024; 43:325-338. [PMID: 38072879 PMCID: PMC10821834 DOI: 10.1007/s10096-023-04728-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/28/2023] [Indexed: 01/28/2024]
Abstract
PURPOSE Despite extensive research, HIV-1 remains a global epidemic with variations in pathogenesis across regions and subtypes. The Viral Infectivity Factor (Vif) protein, which neutralizes the host protein APOBEC3G, has been implicated in differences in clinical outcomes among people living with HIV (PLHIV). Most studies on Vif sequence diversity have focused on subtype B, leaving gaps in understanding Vif variations in HIV-1C regions like South Africa. This study aimed to identify and compare Vif sequence diversity in a cohort of 51 South African PLHIV and other HIV-1C prevalent regions. METHODS Sanger sequencing was used for Vif analysis in the cohort, and additional sequences were obtained from the Los Alamos database. Molecular modeling and docking techniques were employed to study the influence of subtype-specific variants on Vif-APOBEC3G binding affinity. RESULTS The findings showed distinct genetic variations between Vif sequences from India and Uganda, while South African sequences had wider distribution and closer relatedness to both. Specific amino acid substitutions in Vif were associated with geographic groups. Molecular modeling and docking analyses consistently identified specific residues (ARGR19, LYS26, TYR30, TYR44, and TRP79) as primary contributors to intermolecular contacts between Vif and APOBEC3G, essential for their interaction. The Indian Vif variant exhibited the highest predicted binding affinity to APOBEC3G among the studied groups. CONCLUSIONS These results provide insights into Vif sequence diversity in HIV-1C prevalent regions and shed light on differential pathogenesis observed in different geographical areas. The identified Vif amino acid residues warrant further investigation for their diagnostic, prognostic, and therapeutic potential.
Collapse
|
3
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
4
|
Li J, Kang G, Wang J, Yuan H, Wu Y, Meng S, Wang P, Zhang M, Wang Y, Feng Y, Huang H, de Marco A. Affinity maturation of antibody fragments: A review encompassing the development from random approaches to computational rational optimization. Int J Biol Macromol 2023; 247:125733. [PMID: 37423452 DOI: 10.1016/j.ijbiomac.2023.125733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 07/04/2023] [Accepted: 07/06/2023] [Indexed: 07/11/2023]
Abstract
Routinely screened antibody fragments usually require further in vitro maturation to achieve the desired biophysical properties. Blind in vitro strategies can produce improved ligands by introducing random mutations into the original sequences and selecting the resulting clones under more and more stringent conditions. Rational approaches exploit an alternative perspective that aims first at identifying the specific residues potentially involved in the control of biophysical mechanisms, such as affinity or stability, and then to evaluate what mutations could improve those characteristics. The understanding of the antigen-antibody interactions is instrumental to develop this process the reliability of which, consequently, strongly depends on the quality and completeness of the structural information. Recently, methods based on deep learning approaches critically improved the speed and accuracy of model building and are promising tools for accelerating the docking step. Here, we review the features of the available bioinformatic instruments and analyze the reports illustrating the result obtained with their application to optimize antibody fragments, and nanobodies in particular. Finally, the emerging trends and open questions are summarized.
Collapse
Affiliation(s)
- Jiaqi Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Guangbo Kang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Jiewen Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Haibin Yuan
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Yili Wu
- Zhejiang Provincial Clinical Research Center for Mental Disorders, School of Mental Health and the Affiliated Kangning Hospital, Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Wenzhou Medical University, Oujiang Laboratory, Wenzhou, Zhejiang 325035, China
| | - Shuxian Meng
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - Ping Wang
- New Technology R&D Department, Tianjin Modern Innovative TCM Technology Company Limited, Tianjin 300392, China
| | - Miao Zhang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; China Resources Biopharmaceutical Company Limited, Beijing 100029, China
| | - Yuli Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Tianjin Pharmaceutical Da Ren Tang Group Corporation Limited, Traditional Chinese Pharmacy Research Institute, Tianjin Key Laboratory of Quality Control in Chinese Medicine, Tianjin 300457, China; State Key Laboratory of Drug Delivery Technology and Pharmacokinetics, Tianjin Institute of Pharmaceutical Research, Tianjin 300193, China
| | - Yuanhang Feng
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - He Huang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.
| | - Ario de Marco
- Laboratory for Environmental and Life Sciences, University of Nova Gorica, Nova Gorica, Slovenia.
| |
Collapse
|
5
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 135] [Impact Index Per Article: 67.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
6
|
Lee J, Shamim A, Park J, Jang JH, Kim JH, Kwon JY, Kim JW, Kim KK, Lee J. Functional and Structural Changes in the Membrane-Bound O-Acyltransferase Family Member 7 (MBOAT7) Protein: The Pathomechanism of a Novel MBOAT7 Variant in Patients With Intellectual Disability. Front Neurol 2022; 13:836954. [PMID: 35509994 PMCID: PMC9058081 DOI: 10.3389/fneur.2022.836954] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 03/11/2022] [Indexed: 12/05/2022] Open
Abstract
The membrane-bound O-acyltransferase domain-containing 7 (MBOAT7) gene is associated with intellectual disability, early onset seizures, and autism spectrum disorders. This study aimed to determine the pathogenetic mechanism of the MBOAT7 missense variant via molecular modeling. Three patients from a consanguineous family were found to have a homozygous c.757G>A (p.Glu253Lys) variant of MBOAT7. The patients showed prominent dysfunction in gait, swallowing, vocalization, and fine motor function and had intellectual disabilities. Brain magnetic resonance imaging showed signal changes in the bilateral globus pallidi and cerebellar dentate nucleus, which differed with age. In the molecular model of human MBOAT7, Glu253 in the wild-type protein is located close to the backbone carbonyl oxygens in the loop near the helix, suggesting that the ionic interaction could contribute to the conformational stability of the funnel. Molecular modeling showed that Lys253 in the mutant protein was expected to alter the surface charge distribution, thereby potentially affecting substrate specificity. Changes in conformational stability and substrate specificity through varied ionic interactions are the suggested pathophysiological mechanisms of the MBOAT7 variant found in patients with intellectual disabilities.
Collapse
Affiliation(s)
- Jiwon Lee
- Department of Pediatrics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Amen Shamim
- Department of Computer Science, University of Agriculture, Faisalabad, Pakistan
- Department of Precision Medicine, Graduate School of Basic Medical Sciences, Sungkyunkwan University School of Medicine, Suwon, South Korea
| | - Jongho Park
- Department of Laboratory Medicine and Genetics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Ja-Hyun Jang
- Department of Laboratory Medicine and Genetics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Ji Hye Kim
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Jeong-Yi Kwon
- Department of Physical and Rehabilitation Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Jong-Won Kim
- Department of Laboratory Medicine and Genetics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Kyeong Kyu Kim
- Department of Precision Medicine, Graduate School of Basic Medical Sciences, Sungkyunkwan University School of Medicine, Suwon, South Korea
| | - Jeehun Lee
- Department of Pediatrics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| |
Collapse
|
7
|
Vishwakarma P, Vattekatte AM, Shinada N, Diharce J, Martins C, Cadet F, Gardebien F, Etchebest C, Nadaradjane AA, de Brevern AG. V HH Structural Modelling Approaches: A Critical Review. Int J Mol Sci 2022; 23:3721. [PMID: 35409081 PMCID: PMC8998791 DOI: 10.3390/ijms23073721] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/20/2022] Open
Abstract
VHH, i.e., VH domains of camelid single-chain antibodies, are very promising therapeutic agents due to their significant physicochemical advantages compared to classical mammalian antibodies. The number of experimentally solved VHH structures has significantly improved recently, which is of great help, because it offers the ability to directly work on 3D structures to humanise or improve them. Unfortunately, most VHHs do not have 3D structures. Thus, it is essential to find alternative ways to get structural information. The methods of structure prediction from the primary amino acid sequence appear essential to bypass this limitation. This review presents the most extensive overview of structure prediction methods applied for the 3D modelling of a given VHH sequence (a total of 21). Besides the historical overview, it aims at showing how model software programs have been shaping the structural predictions of VHHs. A brief explanation of each methodology is supplied, and pertinent examples of their usage are provided. Finally, we present a structure prediction case study of a recently solved VHH structure. According to some recent studies and the present analysis, AlphaFold 2 and NanoNet appear to be the best tools to predict a structural model of VHH from its sequence.
Collapse
Affiliation(s)
- Poonam Vishwakarma
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Akhila Melarkode Vattekatte
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | | | - Julien Diharce
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
| | - Carla Martins
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Frédéric Cadet
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
- PEACCEL, Artificial Intelligence Department, Square Albin Cachot, F-75013 Paris, France
| | - Fabrice Gardebien
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Catherine Etchebest
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
| | - Aravindan Arun Nadaradjane
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Alexandre G. de Brevern
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| |
Collapse
|
8
|
Zheng W, Li Y, Zhang C, Zhou X, Pearce R, Bell EW, Huang X, Zhang Y. Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins 2021; 89:1734-1751. [PMID: 34331351 PMCID: PMC8616857 DOI: 10.1002/prot.26193] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/06/2021] [Accepted: 07/22/2021] [Indexed: 11/10/2022]
Abstract
In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
9
|
Ding Y, Tang J, Guo F. Protein Crystallization Identification via Fuzzy Model on Linear Neighborhood Representation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1986-1995. [PMID: 31751248 DOI: 10.1109/tcbb.2019.2954826] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
X-ray crystallography is the most popular approach for analyzing protein 3D structure. However, the success rate of protein crystallization is very low (2-10 percent). To reduce the cost of time and resources, lots of computation-based methods are developed to detect the protein crystallization. Improving the accuracy of predicting protein crystallization is very important for the determination of protein structure by X-ray crystallography. At present, many machine learning methods are used to predict protein crystallization. In this article, we propose a Fuzzy Support Vector Machine based on Linear Neighborhood Representation (FSVM-LNR) to predict the crystallization propensity of proteins. Proteins are represented by three types of features (PsePSSM, PSSM-DWT, MMI-PS), and these features are serially combined and fed into FSVM-LNR. FSVM-LNR can filter outliers by membership score, which is calculated via reconstruction residuals of k nearest samples. To evaluate the performance of our predictive model, we test FSVM-LNR on the datasets of TRAIN3587, TEST3585 and TEST500. Our method achieves better Mathew's correlation coefficient (MCC) on TRAIN3587 (MCC: 0.56) and TEST3585 (MCC: 0.58). Although the performance of independent test is not the best on TEST500, FSVM-LNR also has a certain predictability (MCC: 0.70) in the identification of protein crystallization. The good performance on the datasets proves the effectiveness of our method and the better performance on large datasets further demonstrates the stability and superiority of our method.
Collapse
|
10
|
A physiologic rise in cytoplasmic calcium ion signal increases pannexin1 channel activity via a C-terminus phosphorylation by CaMKII. Proc Natl Acad Sci U S A 2021; 118:2108967118. [PMID: 34301850 DOI: 10.1073/pnas.2108967118] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Pannexin1 (Panx1) channels are ubiquitously expressed in vertebrate cells and are widely accepted as adenosine triphosphate (ATP)-releasing membrane channels. Activation of Panx1 has been associated with phosphorylation in a specific tyrosine residue or cleavage of its C-terminal domains. In the present work, we identified a residue (S394) as a putative phosphorylation site by Ca2+/calmodulin-dependent kinase II (CaMKII). In HeLa cells transfected with rat Panx1 (rPanx1), membrane stretch (MS)-induced activation-measured by changes in DAPI uptake rate-was drastically reduced by either knockdown of Piezo1 or pharmacological inhibition of calmodulin or CaMKII. By site-directed mutagenesis we generated rPanx1S394A-EGFP (enhanced green fluorescent protein), which lost its sensitivity to MS, and rPanx1S394D-EGFP, mimicking phosphorylation, which shows high DAPI uptake rate without MS stimulation or cleavage of the C terminus. Using whole-cell patch-clamp and outside-out excised patch configurations, we found that rPanx1-EGFP and rPanx1S394D-EGFP channels showed current at all voltages between ±100 mV, similar single channel currents with outward rectification, and unitary conductance (∼30 to 70 pS). However, using cell-attached configuration we found that rPanx1S394D-EGFP channels show increased spontaneous unitary events independent of MS stimulation. In silico studies revealed that phosphorylation of S394 caused conformational changes in the selectivity filter and increased the average volume of lateral tunnels, allowing ATP to be released via these conduits and DAPI uptake directly from the channel mouth to the cytoplasmic space. These results could explain one possible mechanism for activation of rPanx1 upon increase in cytoplasmic Ca2+ signal elicited by diverse physiological conditions in which the C-terminal domain is not cleaved.
Collapse
|
11
|
A Peptides Prediction Methodology for Tertiary Structure Based on Simulated Annealing. MATHEMATICAL AND COMPUTATIONAL APPLICATIONS 2021. [DOI: 10.3390/mca26020039] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Protein Folding Problem (PFP) is a big challenge that has remained unsolved for more than fifty years. This problem consists of obtaining the tertiary structure or Native Structure (NS) of a protein knowing its amino acid sequence. The computational methodologies applied to this problem are classified into two groups, known as Template-Based Modeling (TBM) and ab initio models. In the latter methodology, only information from the primary structure of the target protein is used. In the literature, Hybrid Simulated Annealing (HSA) algorithms are among the best ab initio algorithms for PFP; Golden Ratio Simulated Annealing (GRSA) is a PFP family of these algorithms designed for peptides. Moreover, for the algorithms designed with TBM, they use information from a target protein’s primary structure and information from similar or analog proteins. This paper presents GRSA-SSP methodology that implements a secondary structure prediction to build an initial model and refine it with HSA algorithms. Additionally, we compare the performance of the GRSAX-SSP algorithms versus its corresponding GRSAX. Finally, our best algorithm GRSAX-SSP is compared with PEP-FOLD3, I-TASSER, QUARK, and Rosetta, showing that it competes in small peptides except when predicting the largest peptides.
Collapse
|
12
|
Zhang GJ, Xie TY, Zhou XG, Wang LJ, Hu J. Protein Structure Prediction Using Population-Based Algorithm Guided by Information Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:697-707. [PMID: 31180869 DOI: 10.1109/tcbb.2019.2921958] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Ab initio protein structure prediction is one of the most challenging problems in computational biology. Multistage algorithms are widely used in ab initio protein structure prediction. The different computational costs of a multistage algorithm for different proteins are important to be considered. In this study, a population-based algorithm guided by information entropy (PAIE), which includes exploration and exploitation stages, is proposed for protein structure prediction. In PAIE, an entropy-based stage switch strategy is designed to switch from the exploration stage to the exploitation stage. Torsion angle statistical information is also deduced from the first stage and employed to enhance the exploitation in the second stage. Results indicate that an improvement in the performance of protein structure prediction in a benchmark of 30 proteins and 17 other free modeling targets in CASP.
Collapse
|
13
|
Wang Y, Ding Y, Tang J, Dai Y, Guo F. CrystalM: A Multi-View Fusion Approach for Protein Crystallization Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:325-335. [PMID: 31027046 DOI: 10.1109/tcbb.2019.2912173] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Improving the accuracy of predicting protein crystallization is very important for protein crystallization projects, which is a critical step for the determination of protein structure by X-ray crystallography. At present, many machine learning methods are used to predict protein crystallization. Here, we use a novel feature combination to construct a SVM model in the prediction of protein crystallization, called as CrystalM. In this work, we extract six features to represent protein sequences, namely Average Block-Position specific scoring matrix (AVBlock-PSSM), Average Block-Secondary Structure (AVBlock-SS), Global Encoding (GE), Pseudo-Position specific scoring matrix (PsePSSM), Protscale, and Discrete Wavelet Transform-Position specific scoring matrix (DWT-PSSM). Moreover, we employ two training datasets (TRAIN3587 and TRAIN1500) and their corresponding independent test datasets (TEST3585 and TEST500) to evaluate CrystalM by feeding multi-view features into Support Vector Machine (SVM) classifier. Two training datasets are employed for five-fold cross validation, and two test datasets are separately used to test the corresponding datasets. Finally, we compare CrystalM with other existing methods in the performance. For the datasets of TRAIN3587 and TEST3585, CrystalM achieves best Accuracy (ACC), best Specificity (SP), and the same Mathew's correlation coefficient (MCC) as the previous outperforming methods in the five-fold cross validation. In particular, ACC, SP, and MCC have surpassed the existing methods in independent test, which proves the effectiveness of CrystalM. Meanwhile, ACC, SP, and MCC are higher than existing methods in the five-fold cross validation for TRAIN1500. Although the performance of independent test for TEST500 is not the best, CrystalM also has a certain predictability in the prediction of protein crystallization. In addition, we find that only choosing the first four features can improve the performance of prediction for TRAIN1500 and TEST500, not only in independent tests but also in five-fold cross validation. This phenomenon indicates that the latter two features can not effectively represent proteins of TRAIN1500 and TEST500. CrystalM is a sequence-based protein crystallization prediction method. The good performance on the datasets proves the effectiveness of CrystalM and the better performance on large datasets further demonstrates the stability and superiority of CrystalM.
Collapse
|
14
|
Abstract
For two decades, Rosetta has consistently been at the forefront of protein structure
prediction. While it has become a very large package comprising programs, scripts, and tools, for
different types of macromolecular modelling such as ligand docking, protein-protein docking,
protein design, and loop modelling, it started as the implementation of an algorithm for ab initio
protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the
literature to describe that algorithm and its contribution to the third edition of the community wide
Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta
stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers
have been contributing to deciphering ’the second half of the genetic code’. Although the focus of
Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is
associated with its fragment-assembly protein structure prediction approach. Following a
presentation of the main concepts underpinning its foundation, especially sequence-structure
correlation and usage of fragments, we review the main stages of its developments and highlight
the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
Collapse
Affiliation(s)
- Jad Abbass
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, United Kingdom
| |
Collapse
|
15
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
16
|
Dhingra S, Sowdhamini R, Cadet F, Offmann B. A glance into the evolution of template-free protein structure prediction methodologies. Biochimie 2020; 175:85-92. [DOI: 10.1016/j.biochi.2020.04.026] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/24/2020] [Accepted: 04/27/2020] [Indexed: 11/26/2022]
|
17
|
Noncanonical type 2B von Willebrand disease associated with mutations in the VWF D'D3 and D4 domains. Blood Adv 2020; 4:3405-3415. [PMID: 32722784 DOI: 10.1182/bloodadvances.2020002334] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 06/22/2020] [Indexed: 11/20/2022] Open
Abstract
We observed a 55-year-old Italian man who presented with mucosal and cutaneous bleeding. Results of his blood analysis showed low levels of von Willebrand factor (VWF) antigen and VWF activity (both VWF ristocetin cofactor and VWF collagen binding), mild thrombocytopenia, increased ristocetin-induced platelet aggregation, and a deficiency of high-molecular-weight multimers, all typical phenotypic hallmarks of type 2B von Willebrand disease (VWD). The analysis of the VWF gene sequence revealed heterozygous in cis mutations: (1) c.2771G>A and (2) c.6532G>T substitutions in the exons 21 and 37, respectively. The first mutation causes the substitution of an Arg residue with a Gln at position 924, in the D'D3 domain. The second mutation causes an Ala to Ser substitution at position 2178 in the D4 domain. The patient's daughter did not present the same fatherly mutations but showed only the heterozygous polymorphic c.3379C>T mutation in exon 25 of the VWF gene causing the p.P1127S substitution, inherited from her mother. The in vitro expression of the heterozygous in cis VWF mutant rVWFWT/rVWF924Q-2178S confirmed and recapitulated the ex vivo VWF findings. Molecular modeling showed that these in cis mutations stabilize a partially stretched and open conformation of the VWF monomer. Transmission electron microscopy and atomic force microscopy showed in the heterozygous recombinant form rVWFWT/rVWF924Q-2178S a stretched conformation, forming strings even under static conditions. Thus, the heterozygous in cis mutations 924Q/2178S promote conformational transitions in the VWF molecule, causing a type 2B-like VWD phenotype, despite the absence of typical mutations in the A1 domain of VWF.
Collapse
|
18
|
Abbass J, Nebel JC. Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure. BMC Bioinformatics 2020; 21:170. [PMID: 32357827 PMCID: PMC7195757 DOI: 10.1186/s12859-020-3491-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whenever suitable template structures are not available, usage of fragment-based protein structure prediction becomes the only practical alternative as pure ab initio techniques require massive computational resources even for very small proteins. However, inaccuracy of their energy functions and their stochastic nature imposes generation of a large number of decoys to explore adequately the solution space, limiting their usage to small proteins. Taking advantage of the uneven complexity of the sequence-structure relationship of short fragments, we adjusted the fragment insertion process by customising the number of available fragment templates according to the expected complexity of the predicted local secondary structure. Whereas the number of fragments is kept to its default value for coil regions, important and dramatic reductions are proposed for beta sheet and alpha helical regions, respectively. RESULTS The evaluation of our fragment selection approach was conducted using an enhanced version of the popular Rosetta fragment-based protein structure prediction tool. It was modified so that the number of fragment candidates used in Rosetta could be adjusted based on the local secondary structure. Compared to Rosetta's standard predictions, our strategy delivered improved first models, + 24% and + 6% in terms of GDT, when using 2000 and 20,000 decoys, respectively, while reducing significantly the number of fragment candidates. Furthermore, our enhanced version of Rosetta is able to deliver with 2000 decoys a performance equivalent to that produced by standard Rosetta while using 20,000 decoys. We hypothesise that, as the fragment insertion process focuses on the most challenging regions, such as coils, fewer decoys are needed to explore satisfactorily conformation spaces. CONCLUSIONS Taking advantage of the high accuracy of sequence-based secondary structure predictions, we showed the value of that information to customise the number of candidates used during the fragment insertion process of fragment-based protein structure prediction. Experimentations conducted using standard Rosetta showed that, when using the recommended number of decoys, i.e. 20,000, our strategy produces better results. Alternatively, similar results can be achieved using only 2000 decoys. Consequently, we recommend the adoption of this strategy to either improve significantly model quality or reduce processing times by a factor 10.
Collapse
Affiliation(s)
- Jad Abbass
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
| |
Collapse
|
19
|
Discriminative margin-sensitive autoencoder for collective multi-view disease analysis. Neural Netw 2020; 123:94-107. [DOI: 10.1016/j.neunet.2019.11.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Revised: 08/18/2019] [Accepted: 11/13/2019] [Indexed: 12/18/2022]
|
20
|
Zheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins 2019; 87:1149-1164. [PMID: 31365149 PMCID: PMC6851476 DOI: 10.1002/prot.25792] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 07/14/2019] [Accepted: 07/27/2019] [Indexed: 12/28/2022]
Abstract
We report the results of two fully automated structure prediction pipelines, "Zhang-Server" and "QUARK", in CASP13. The pipelines were built upon the C-I-TASSER and C-QUARK programs, which in turn are based on I-TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence-profiles for contact prediction; (b) an improved meta-method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact-maps by coupling precision-matrices with deep residual convolutional neural-networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM-scores of the first models produced by C-I-TASSER and C-QUARK were 28% and 56% higher than those constructed by I-TASSER and QUARK, respectively. For the first time, contact-map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM-scores of C-I-TASSER models were significantly higher than those of I-TASSER models with a P-value <.05. Detailed data analyses showed that the success of C-I-TASSER and C-QUARK was mainly due to the increased accuracy of deep-learning-based contact-maps, as well as the careful balance between sequence-based contact restraints, threading templates, and generic knowledge-based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi-domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact-based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
21
|
Wang Y, Shi Q, Yang P, Zhang C, Mortuza SM, Xue Z, Ning K, Zhang Y. Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families. Genome Biol 2019; 20:229. [PMID: 31676016 PMCID: PMC6825341 DOI: 10.1186/s13059-019-1823-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 09/13/2019] [Indexed: 02/01/2023] Open
Abstract
INTRODUCTION The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences.
Collapse
Affiliation(s)
- Yan Wang
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Qiang Shi
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Pengshuo Yang
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Zhidong Xue
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
| | - Kang Ning
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
22
|
Wang Y, Virtanen J, Xue Z, Zhang Y. I-TASSER-MR: automated molecular replacement for distant-homology proteins using iterative fragment assembly and progressive sequence truncation. Nucleic Acids Res 2019; 45:W429-W434. [PMID: 28472524 PMCID: PMC5793832 DOI: 10.1093/nar/gkx349] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 04/20/2017] [Indexed: 11/16/2022] Open
Abstract
Molecular replacement (MR) is one of the most common techniques used for solving the phase problem in X-ray crystal diffraction. The success rate of MR however drops quickly when the sequence identity between query and templates is reduced, while the I-TASSER-MR server is designed to solve the phase problem for proteins that lack close homologous templates. Starting from a sequence, it first generates full-length models using I-TASSER by iterative structural fragment reassembly. A progressive sequence truncation procedure is then used for editing the models based on local variations of the structural assembly simulations. Next, the edited models are submitted to MR-REX to search for optimal placements in the crystal unit-cells through replica-exchange Monte Carlo simulations, with the phasing results used by CNS for final atomic model refinement and selection. The I-TASSER-MR algorithm was tested in large-scale benchmark datasets and solved 36% more targets compared to using the best threading templates. The server takes primary sequence and raw crystal diffraction data as input, with output containing annotated phase information and refined structure models. It also allows users to choose between different methods for setting B-factors and the number of models used for phasing. The online server is freely available at http://zhanglab.ccmb.med.umich.edu/I-TASSER-MR.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
23
|
Wang Y, Wang J, Li R, Shi Q, Xue Z, Zhang Y. ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly. Nucleic Acids Res 2019; 45:W400-W407. [PMID: 28498994 PMCID: PMC5793814 DOI: 10.1093/nar/gkx410] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 04/28/2017] [Indexed: 12/21/2022] Open
Abstract
We develop a hierarchical pipeline, ThreaDomEx, for both continuous domain (CD) and discontinuous domain (DCD) structure predictions. Starting from a query sequence, ThreaDomEx first threads it through the PDB to identify multiple structure templates, where a profile of domain conservation score (DC-score) is derived for domain-segment assignment. To further detect DCDs that consist of separated segments along the sequence, a boundary-clustering algorithm is used to refine the DCD-linker locations. In case that the templates do not contain DCDs, a domain-segment assembly process, guided by symmetry comparison, is applied for further DCD detections. ThreaDomEx was tested a set of 1111 proteins and achieved a normalized domain overlap score of 89.3% compared to experimental data, which is significantly higher than other state-of-the-art methods. It also recalls 26.7% of DCDs with 72.7% precision on the proteins for which threading failed to detect any DCDs. The server provides facilities for users to interactively refine the domain models by adjusting DC-score threshold, deleting and adding domain linkers, and assembling domain segments, which are particularly helpful for the hard targets for which current methods have a low accuracy while human-expert knowledge and experimental insights can be used for refining models. ThreaDomEX server is available at http://zhanglab.ccmb.med.umich.edu/ThreaDomEx.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jian Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Ruiming Li
- School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Qiang Shi
- School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
24
|
Tang L, Yang J, Chen J, Zhang J, Yu H, Shen Z. Design of salt-bridge cyclization peptide tags for stability and activity enhancement of enzymes. Process Biochem 2019. [DOI: 10.1016/j.procbio.2019.03.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
25
|
Xu G, Ma T, Wang Q, Ma J. OPUS-SSF: A side-chain-inclusive scoring function for ranking protein structural models. Protein Sci 2019; 28:1157-1162. [PMID: 30919509 DOI: 10.1002/pro.3608] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 03/21/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022]
Abstract
We introduce a side-chain-inclusive scoring function, named OPUS-SSF, for ranking protein structural models. The method builds a scoring function based on the native distributions of the coordinate components of certain anchoring points in a local molecular system for peptide segments of 5, 7, 9, and 11 residues in length. Differing from our previous OPUS-CSF [Xu et al., Protein Sci. 2018; 27: 286-292], which exclusively uses main chain information, OPUS-SSF employs anchoring points on side chains so that the effect of side chains is taken into account. The performance of OPUS-SSF was tested on 15 decoy sets containing totally 603 proteins, and 571 of them had their native structures recognized from their decoys. Similar to OPUS-CSF, OPUS-SSF does not employ the Boltzmann formula in constructing scoring functions. The results indicate that OPUS-SSF has achieved a significant improvement on decoy recognition and it should be a very useful tool for protein structural prediction and modeling.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, Texas 77005.,Department of Bioengineering, Rice University, Houston, Texas 77005
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China.,Applied Physics Program, Rice University, Houston, Texas 77005.,Department of Bioengineering, Rice University, Houston, Texas 77005.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030
| |
Collapse
|
26
|
Blaszczyk M, Gront D, Kmiecik S, Kurcinski M, Kolinski M, Ciemny MP, Ziolkowska K, Panek M, Kolinski A. Protein Structure Prediction Using Coarse-Grained Models. SPRINGER SERIES ON BIO- AND NEUROSYSTEMS 2019. [DOI: 10.1007/978-3-319-95843-9_2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
27
|
Kc DB. Recent advances in sequence-based protein structure prediction. Brief Bioinform 2018; 18:1021-1032. [PMID: 27562963 DOI: 10.1093/bib/bbw070] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Indexed: 11/13/2022] Open
Abstract
The most accurate characterizations of the structure of proteins are provided by structural biology experiments. However, because of the high cost and labor-intensive nature of the structural experiments, the gap between the number of protein sequences and solved structures is widening rapidly. Development of computational methods to accurately model protein structures from sequences is becoming increasingly important to the biological community. In this article, we highlight some important progress in the field of protein structure prediction, especially those related to free modeling (FM) methods that generate structure models without using homologous templates. We also provide a short synopsis of some of the recent advances in FM approaches as demonstrated in the recent Computational Assessment of Structure Prediction competition as well as recent trends and outlook for FM approaches in protein structure prediction.
Collapse
|
28
|
Avishek K, Ahuja K, Pradhan D, Gannavaram S, Selvapandiyan A, Nakhasi HL, Salotra P. A Leishmania-specific gene upregulated at the amastigote stage is crucial for parasite survival. Parasitol Res 2018; 117:3215-3228. [DOI: 10.1007/s00436-018-6020-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 07/17/2018] [Indexed: 01/03/2023]
|
29
|
Guzenko D, Strelkov SV. Granular clustering of de novo protein models. Bioinformatics 2018; 33:390-396. [PMID: 28171609 DOI: 10.1093/bioinformatics/btw628] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Revised: 09/19/2016] [Accepted: 09/27/2016] [Indexed: 11/12/2022] Open
Abstract
Motivation Modern algorithms for de novo prediction of protein structures typically output multiple full-length models (decoys) rather than a single solution. Subsequent clustering of such decoys is used both to gauge the success of the modelling and to decide on the most native-like conformation. At the same time, partial protein models are sufficient for some applications such as crystallographic phasing by molecular replacement (MR) in particular, provided these models represent a certain part of the target structure with reasonable accuracy. Results Here we propose a novel clustering algorithm that natively operates in the space of partial models through an approach known as granular clustering (GC). The algorithm is based on growing local similarities found in a pool of initial decoys. We demonstrate that the resulting clusters of partial models provide a substantially more accurate structural detail on the target protein than those obtained upon a global alignment of decoys. As the result, the partial models output by our GC algorithm are also much more effective towards the MR procedure, compared to the models produced by existing software. Availability and Implementation The source code is freely available at https://github.com/biocryst/gc Contact sergei.strelkov@kuleuven.be Suplementary Information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dmytro Guzenko
- Department of Pharmaceutical and Pharmacological Sciences, KU Leuven, Leuven, Belgium
| | - Sergei V Strelkov
- Department of Pharmaceutical and Pharmacological Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
30
|
Keasar C, McGuffin LJ, Wallner B, Chopra G, Adhikari B, Bhattacharya D, Blake L, Bortot LO, Cao R, Dhanasekaran BK, Dimas I, Faccioli RA, Faraggi E, Ganzynkowicz R, Ghosh S, Ghosh S, Giełdoń A, Golon L, He Y, Heo L, Hou J, Khan M, Khatib F, Khoury GA, Kieslich C, Kim DE, Krupa P, Lee GR, Li H, Li J, Lipska A, Liwo A, Maghrabi AHA, Mirdita M, Mirzaei S, Mozolewska MA, Onel M, Ovchinnikov S, Shah A, Shah U, Sidi T, Sieradzan AK, Ślusarz M, Ślusarz R, Smadbeck J, Tamamis P, Trieber N, Wirecki T, Yin Y, Zhang Y, Bacardit J, Baranowski M, Chapman N, Cooper S, Defelicibus A, Flatten J, Koepnick B, Popović Z, Zaborowski B, Baker D, Cheng J, Czaplewski C, Delbem ACB, Floudas C, Kloczkowski A, Ołdziej S, Levitt M, Scheraga H, Seok C, Söding J, Vishveshwara S, Xu D, Crivelli SN. An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12. Sci Rep 2018; 8:9939. [PMID: 29967418 PMCID: PMC6028396 DOI: 10.1038/s41598-018-26812-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 05/17/2018] [Indexed: 01/14/2023] Open
Abstract
Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research.
Collapse
Affiliation(s)
- Chen Keasar
- Department of Computer Science, Ben Gurion University of the Negev, Be'er sheva, Israel
| | - Liam J McGuffin
- Biomedical Sciences Division, School of Biological Sciences, University of Reading, Reading, RG6 6AS, UK
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry, and Biology, Linköping University, Linköping, Sweden
| | - Gaurav Chopra
- Department of Chemistry, College of Science, Purdue University, West Lafayette, IN, USA
- Purdue Institute for Drug Discovery, Purdue University, West Lafayette, IN, USA
- Purdue Center for Cancer Research, Purdue University, West Lafayette, IN, USA
- Purdue Institute for Inflammation, Immunology and Infectious Disease, Purdue University, West Lafayette, IN, USA
- Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, USA
| | - Badri Adhikari
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Debswapna Bhattacharya
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | - Lauren Blake
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Leandro Oliveira Bortot
- Laboratory of Biological Physics, Faculty of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, São Paulo, Brazil
| | - Renzhi Cao
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - B K Dhanasekaran
- Molecular Biophysics Unit and IISC Mathematics Initiative, Indian Institute of Science, Bangalore, India
| | - Itzhel Dimas
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Eshel Faraggi
- Research and Information Systems, LLC, Carmel, IN, USA
- Department of Biochemistry and Molecular Biology, IU School of Medicine, Indianapolis, IN, USA
- Batelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA
| | | | - Sambit Ghosh
- Molecular Biophysics Unit and IISC Mathematics Initiative, Indian Institute of Science, Bangalore, India
| | - Soma Ghosh
- Molecular Biophysics Unit and IISC Mathematics Initiative, Indian Institute of Science, Bangalore, India
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Lukasz Golon
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Yi He
- School of Engineering, University of California, Merced, CA, USA
| | - Lim Heo
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Main Khan
- Department of Computer and Information Science, University of Massachusetts Dartmouth, MA, USA
| | - Firas Khatib
- Department of Computer and Information Science, University of Massachusetts Dartmouth, MA, USA
| | - George A Khoury
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
| | - Chris Kieslich
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, USA
| | - David E Kim
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Pawel Krupa
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Gyu Rie Lee
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Hongbo Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- School of Computer Science and Information Technology, NorthEast Normal University, Changchun, China
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Jilong Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | | | - Adam Liwo
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Ali Hassan A Maghrabi
- Biomedical Sciences Division, School of Biological Sciences, University of Reading, Reading, RG6 6AS, UK
| | - Milot Mirdita
- Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Shokoufeh Mirzaei
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- California State Polytechnic University, Pomona, CA, USA
| | | | - Melis Onel
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Anand Shah
- Department of Computer and Information Science, University of Massachusetts Dartmouth, MA, USA
| | - Utkarsh Shah
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA
| | - Tomer Sidi
- Department of Computer Science, Ben Gurion University of the Negev, Be'er sheva, Israel
| | | | | | - Rafal Ślusarz
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - James Smadbeck
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
| | - Phanourios Tamamis
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, USA
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA
| | - Nicholas Trieber
- Department of Computer and Information Science, University of Massachusetts Dartmouth, MA, USA
| | - Tomasz Wirecki
- Faculty of Chemistry, University of Gdansk, Gdańsk, Poland
| | - Yanping Yin
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jaume Bacardit
- Interdisciplinary Computing and Complex BioSystems (ICOS) research group, School of Computing, Newcastle University, Newcastle-upon-Tyne, UK
| | - Maciej Baranowski
- Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Gdańsk, Poland
| | - Nicholas Chapman
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Seth Cooper
- College of Computer and Information Science, Northeastern University, Boston, MA, USA
| | - Alexandre Defelicibus
- Institute of Mathematical and Computer Sciences, University of São Paulo, São Paulo, Brazil
| | - Jeff Flatten
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Brian Koepnick
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Zoran Popović
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | | | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | | | | | | | | | - Stanislaw Ołdziej
- Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Gdańsk, Poland
| | - Michael Levitt
- Department of Structural Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Harold Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Johannes Söding
- Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Saraswathi Vishveshwara
- Molecular Biophysics Unit and IISC Mathematics Initiative, Indian Institute of Science, Bangalore, India
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Silvia N Crivelli
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- Department of Computer Science, University of California, Davis, CA, USA.
| |
Collapse
|
31
|
Xie M, Muchero W, Bryan AC, Yee K, Guo HB, Zhang J, Tschaplinski TJ, Singan VR, Lindquist E, Payyavula RS, Barros-Rios J, Dixon R, Engle N, Sykes RW, Davis M, Jawdy SS, Gunter LE, Thompson O, DiFazio SP, Evans LM, Winkeler K, Collins C, Schmutz J, Guo H, Kalluri U, Rodriguez M, Feng K, Chen JG, Tuskan GA. A 5-Enolpyruvylshikimate 3-Phosphate Synthase Functions as a Transcriptional Repressor in Populus. THE PLANT CELL 2018; 30:1645-1660. [PMID: 29891568 PMCID: PMC6096593 DOI: 10.1105/tpc.18.00168] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Revised: 04/17/2018] [Accepted: 06/05/2018] [Indexed: 05/21/2023]
Abstract
Long-lived perennial plants, with distinctive habits of inter-annual growth, defense, and physiology, are of great economic and ecological importance. However, some biological mechanisms resulting from genome duplication and functional divergence of genes in these systems remain poorly studied. Here, we discovered an association between a poplar (Populus trichocarpa) 5-enolpyruvylshikimate 3-phosphate synthase gene (PtrEPSP) and lignin biosynthesis. Functional characterization of PtrEPSP revealed that this isoform possesses a helix-turn-helix motif in the N terminus and can function as a transcriptional repressor that regulates expression of genes in the phenylpropanoid pathway in addition to performing its canonical biosynthesis function in the shikimate pathway. We demonstrated that this isoform can localize in the nucleus and specifically binds to the promoter and represses the expression of a SLEEPER-like transcriptional regulator, which itself specifically binds to the promoter and represses the expression of PtrMYB021 (known as MYB46 in Arabidopsis thaliana), a master regulator of the phenylpropanoid pathway and lignin biosynthesis. Analyses of overexpression and RNAi lines targeting PtrEPSP confirmed the predicted changes in PtrMYB021 expression patterns. These results demonstrate that PtrEPSP in its regulatory form and PtrhAT form a transcriptional hierarchy regulating phenylpropanoid pathway and lignin biosynthesis in Populus.
Collapse
Affiliation(s)
- Meng Xie
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Wellington Muchero
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Anthony C Bryan
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Kelsey Yee
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Hao-Bo Guo
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996
| | - Jin Zhang
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Timothy J Tschaplinski
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Vasanth R Singan
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598
| | - Erika Lindquist
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598
| | - Raja S Payyavula
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Jaime Barros-Rios
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, Denton, Texas 76203
| | - Richard Dixon
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, Denton, Texas 76203
| | - Nancy Engle
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Robert W Sykes
- Bioscience Center, National Renewable Energy Laboratory, Golden, Colorado 80401
| | - Mark Davis
- Bioscience Center, National Renewable Energy Laboratory, Golden, Colorado 80401
| | - Sara S Jawdy
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Lee E Gunter
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Olivia Thompson
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Stephen P DiFazio
- Department of Biology, West Virginia University, Morgantown, West Virginia 26506
| | - Luke M Evans
- Department of Biology, West Virginia University, Morgantown, West Virginia 26506
| | | | | | - Jeremy Schmutz
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806
| | - Hong Guo
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996
| | - Udaya Kalluri
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Miguel Rodriguez
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Kai Feng
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Jin-Gui Chen
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Gerald A Tuskan
- BioEnergy Science Center and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598
| |
Collapse
|
32
|
Gao S, Song S, Cheng J, Todo Y, Zhou M. Incorporation of Solvent Effect into Multi-Objective Evolutionary Algorithm for Improved Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1365-1378. [PMID: 28534784 DOI: 10.1109/tcbb.2017.2705094] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The problem of predicting the three-dimensional (3-D) structure of a protein from its one-dimensional sequence has been called the "holy grail of molecular biology", and it has become an important part of structural genomics projects. Despite the rapid developments in computer technology and computational intelligence, it remains challenging and fascinating. In this paper, to solve it we propose a multi-objective evolutionary algorithm. We decompose the protein energy function Chemistry at HARvard Macromolecular Mechanics force fields into bond and non-bond energies as the first and second objectives. Considering the effect of solvent, we innovatively adopt a solvent-accessible surface area as the third objective. We use 66 benchmark proteins to verify the proposed method and obtain better or competitive results in comparison with the existing methods. The results suggest the necessity to incorporate the effect of solvent into a multi-objective evolutionary algorithm to improve protein structure prediction in terms of accuracy and efficiency.
Collapse
|
33
|
Kozic M, Fox SJ, Thomas JM, Verma CS, Rigden DJ. Large scale ab initio modeling of structurally uncharacterized antimicrobial peptides reveals known and novel folds. Proteins 2018; 86:548-565. [DOI: 10.1002/prot.25473] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 01/16/2018] [Accepted: 01/29/2018] [Indexed: 12/20/2022]
Affiliation(s)
- Mara Kozic
- Institute of Integrative Biology, University of Liverpool; Liverpool L69 7ZB U.K
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute; Singapore
| | - Stephen J. Fox
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute; Singapore
| | - Jens M. Thomas
- Institute of Integrative Biology, University of Liverpool; Liverpool L69 7ZB U.K
| | - Chandra S. Verma
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute; Singapore
- Department of Biological Sciences; National University of Singapore; Singapore
- School of Biological Sciences; Nanyang Technological University; Singapore
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool; Liverpool L69 7ZB U.K
| |
Collapse
|
34
|
Usability as the Key Factor to the Design of a Web Server for the CReF Protein Structure Predictor: The wCReF. INFORMATION 2018. [DOI: 10.3390/info9010020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
35
|
Yerabham ASK, Müller-Schiffmann A, Ziehm T, Stadler A, Köber S, Indurkhya X, Marreiros R, Trossbach SV, Bradshaw NJ, Prikulis I, Willbold D, Weiergräber OH, Korth C. Biophysical insights from a single chain camelid antibody directed against the Disrupted-in-Schizophrenia 1 protein. PLoS One 2018; 13:e0191162. [PMID: 29324815 PMCID: PMC5764400 DOI: 10.1371/journal.pone.0191162] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 12/31/2017] [Indexed: 01/17/2023] Open
Abstract
Accumulating evidence suggests an important role for the Disrupted-in-Schizophrenia 1 (DISC1) protein in neurodevelopment and chronic mental illness. In particular, the C-terminal 300 amino acids of DISC1 have been found to mediate important protein-protein interactions and to harbor functionally important phosphorylation sites and disease-associated polymorphisms. However, long disordered regions and oligomer-forming subdomains have so far impeded structural analysis. VHH domains derived from camelid heavy chain only antibodies are minimal antigen binding modules with appreciable solubility and stability, which makes them well suited for the stabilizing proteins prior to structural investigation. Here, we report on the generation of a VHH domain derived from an immunized Lama glama, displaying high affinity for the human DISC1 C region (aa 691-836), and its characterization by surface plasmon resonance, size exclusion chromatography and immunological techniques. The VHH-DISC1 (C region) complex was also used for structural investigation by small angle X-ray scattering analysis. In combination with molecular modeling, these data support predictions regarding the three-dimensional fold of this DISC1 segment as well as its steric arrangement in complex with our VHH antibody.
Collapse
Affiliation(s)
- Antony S. K. Yerabham
- Department of Neuropathology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | - Tamar Ziehm
- Institute of Complex Systems (ICS-6: Structural Biochemistry), Forschungszentrum Jülich, Jülich, Germany
| | - Andreas Stadler
- Jülich Centre for Neutron Science JCNS and Institute for Complex Systems ICS, Forschungszentrum Jülich, Jülich, Germany
| | - Sabrina Köber
- Department of Neuropathology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Xela Indurkhya
- Department of Neuropathology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Rita Marreiros
- Department of Neuropathology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Svenja V. Trossbach
- Department of Neuropathology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Nicholas J. Bradshaw
- Department of Neuropathology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Ingrid Prikulis
- Department of Neuropathology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Dieter Willbold
- Institute of Complex Systems (ICS-6: Structural Biochemistry), Forschungszentrum Jülich, Jülich, Germany
- Institute for Physical Biology and BMFZ, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Oliver H. Weiergräber
- Institute of Complex Systems (ICS-6: Structural Biochemistry), Forschungszentrum Jülich, Jülich, Germany
| | - Carsten Korth
- Department of Neuropathology, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
36
|
Barradas-Bautista D, Rosell M, Pallara C, Fernández-Recio J. Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems. PROTEIN-PROTEIN INTERACTIONS IN HUMAN DISEASE, PART A 2018; 110:203-249. [DOI: 10.1016/bs.apcsb.2017.06.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
37
|
Zhang C, Mortuza SM, He B, Wang Y, Zhang Y. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 2017; 86 Suppl 1:136-151. [PMID: 29082551 DOI: 10.1002/prot.25414] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/09/2017] [Accepted: 10/27/2017] [Indexed: 12/26/2022]
Abstract
We develop two complementary pipelines, "Zhang-Server" and "QUARK", based on I-TASSER and QUARK pipelines for template-based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I-TASSER and QUARK successfully folds three medium-size FM targets that have more than 150 residues, even though the interplay between the two pipelines still awaits further optimization. Newly developed sequence-based contact prediction by NeBcon plays a critical role to enhance the quality of models, particularly for FM targets, by the new pipelines. The inclusion of NeBcon predicted contacts as restraints in the QUARK simulations results in an average TM-score of 0.41 for the best in top five predicted models, which is 37% higher than that by the QUARK simulations without contacts. In particular, there are seven targets that are converted from non-foldable to foldable (TM-score >0.5) due to the use of contact restraints in the simulations. Another additional feature in the current pipelines is the local structure quality prediction by ResQ, which provides a robust residue-level modeling error estimation. Despite the success, significant challenges still remain in ab initio modeling of multi-domain proteins and folding of β-proteins with complicated topologies bound by long-range strand-strand interactions. Improvements on domain boundary and long-range contact prediction, as well as optimal use of the predicted contacts and multiple threading alignments, are critical to address these issues seen in the CASP12 experiment.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
38
|
Zhang GJ, Zhou XG, Yu XF, Hao XH, Yu L. Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1288-1301. [PMID: 28113726 DOI: 10.1109/tcbb.2016.2566617] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
De novo protein structure prediction aims to search for low-energy conformations as it follows the thermodynamics hypothesis that places native conformations at the global minimum of the protein energy surface. However, the native conformation is not necessarily located in the lowest-energy regions owing to the inaccuracies of the energy model. This study presents a differential evolution algorithm using distance profile-based selection strategy to sample conformations with reasonable structure effectively. In the proposed algorithm, besides energy, the residue-residue distance is considered another measure of the conformation. The average distance errors of decoys between the distance of each residue pair and the corresponding distance in the distance profiles are first calculated when the trial conformation yields a larger energy value than that of the target. Then, the distance acceptance probability of the trial conformation is designed based on distance profiles if the trial conformation obtains a lower average distance error compared with that of the target conformation. The trial conformation is accepted to the next generation in accordance with its distance acceptance probability. By using the dual constraints of energy and distance in guiding sampling, the algorithm can sample conformations with lower energies and more reasonable structures. Experimental results of 28 benchmark proteins show that the proposed algorithm can effectively predict near-native protein structures.
Collapse
|
39
|
Kato K, Nakayoshi T, Fukuyoshi S, Kurimoto E, Oda A. Validation of Molecular Dynamics Simulations for Prediction of Three-Dimensional Structures of Small Proteins. Molecules 2017; 22:molecules22101716. [PMID: 29023395 PMCID: PMC6151455 DOI: 10.3390/molecules22101716] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Revised: 10/05/2017] [Accepted: 10/10/2017] [Indexed: 12/14/2022] Open
Abstract
Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton’s equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10–46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10–34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.
Collapse
Affiliation(s)
- Koichi Kato
- Graduate School of Pharmacy, Meijo University, 150 Yagotoyama, Tempaku-ku, Nagoya, Aichi 468-8503, Јapan.
- Department of Pharmacy, Kinjo Gakuin University, 2-1723 Omori, Moriyama-ku, Nagoya, Aichi 463-8521, Japan.
| | - Tomoki Nakayoshi
- Graduate School of Pharmacy, Meijo University, 150 Yagotoyama, Tempaku-ku, Nagoya, Aichi 468-8503, Јapan.
- Institute of Medical, Pharmaceutical and Health Sciences, Kanazawa University, Kakuma-machi, Kanazawa, Ishikawa 920-1192, Japan.
| | - Shuichi Fukuyoshi
- Institute of Medical, Pharmaceutical and Health Sciences, Kanazawa University, Kakuma-machi, Kanazawa, Ishikawa 920-1192, Japan.
| | - Eiji Kurimoto
- Graduate School of Pharmacy, Meijo University, 150 Yagotoyama, Tempaku-ku, Nagoya, Aichi 468-8503, Јapan.
| | - Akifumi Oda
- Graduate School of Pharmacy, Meijo University, 150 Yagotoyama, Tempaku-ku, Nagoya, Aichi 468-8503, Јapan.
- Institute of Medical, Pharmaceutical and Health Sciences, Kanazawa University, Kakuma-machi, Kanazawa, Ishikawa 920-1192, Japan.
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan.
| |
Collapse
|
40
|
Hao XH, Zhang GJ, Zhou XG. Conformational Space Sampling Method Using Multi-Subpopulation Differential Evolution for De novo Protein Structure Prediction. IEEE Trans Nanobioscience 2017; 16:618-633. [DOI: 10.1109/tnb.2017.2749243] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
41
|
Jalily Hasani H, Ahmed M, Barakat K. A comprehensive structural model for the human KCNQ1/KCNE1 ion channel. J Mol Graph Model 2017; 78:26-47. [PMID: 28992529 DOI: 10.1016/j.jmgm.2017.09.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 09/25/2017] [Accepted: 09/26/2017] [Indexed: 10/18/2022]
Abstract
The voltage-gated KCNQ1/KCNE1 potassium ion channel complex, forms the slow delayed rectifier (IKs) current in the heart, which plays an important role in heart signaling. The importance of KCNQ1/KCNE1 channel's function is further implicated by the linkage between loss-of-function and gain-of-function mutations in KCNQ1 or KCNE1, and long QT syndromes, congenital atrial fibrillation, and short QT syndrome. Also, KCNQ1/KCNE1 channels are an off-target for many non-cardiovascular drugs, leading to fatal cardiac irregularities. One solution to address and study the mentioned aspects of KCNQ1/KNCE1 channel would be the structural studies using a validated and accurate model. Along the same line in this study, we have used several top-notch modeling approaches to build a structural model for the open state of KCNQ1 protein, which is both accurate and compatible with available experimental data. Next, we included the KCNE1 protein components using data-driven protein-protein docking simulations, encompassing a 4:2 stoichiometry to complete the picture of the channel complex formed by these two proteins. All the protein systems generated through these processes were refined by long Molecular Dynamics simulations. The refined models were analyzed extensively to infer data about the interaction of KCNQ1 channel with its accessory KCNE1 beta subunits.
Collapse
Affiliation(s)
- Horia Jalily Hasani
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Marawan Ahmed
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Khaled Barakat
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta, Canada; Li Ka Shing Institute of Virology, University of Alberta, Edmonton, Alberta, Canada; Li Ka Shing Applied Virology Institute, University of Alberta, Edmonton, Alberta, Canada.
| |
Collapse
|
42
|
Mackenzie CO, Grigoryan G. Protein structural motifs in prediction and design. Curr Opin Struct Biol 2017; 44:161-167. [PMID: 28460216 PMCID: PMC5513761 DOI: 10.1016/j.sbi.2017.03.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 03/18/2017] [Accepted: 03/28/2017] [Indexed: 01/11/2023]
Abstract
The Protein Data Bank (PDB) has been an integral resource for shaping our fundamental understanding of protein structure and for the advancement of such applications as protein design and structure prediction. Over the years, information from the PDB has been used to generate models ranging from specific structural mechanisms to general statistical potentials. With accumulating structural data, it has become possible to mine for more complete and complex structural observations, deducing more accurate generalizations. Motif libraries, which capture recurring structural features along with their sequence preferences, have exposed modularity in the structural universe and found successful application in various problems of structural biology. Here we summarize recent achievements in this arena, focusing on subdomain level structural patterns and their applications to protein design and structure prediction, and suggest promising future directions as the structural database continues to grow.
Collapse
Affiliation(s)
- Craig O Mackenzie
- Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, United States
| | - Gevorg Grigoryan
- Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, United States; Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States.
| |
Collapse
|
43
|
Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks. Methods Mol Biol 2017; 1558:415-436. [PMID: 28150250 DOI: 10.1007/978-1-4939-6783-4_20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
Collapse
|
44
|
Liu D, Liu J, Wang W, Xia L, Yang J, Sun S, Zhang F. Computational and Experimental Investigation of the Antimicrobial Peptide Cecropin XJ and its Ligands as the Impact Factors of Antibacterial Activity. FOOD BIOPHYS 2016. [DOI: 10.1007/s11483-016-9445-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
45
|
Zhang S, Cui FC, Cao Y, Li YQ. Sequence identification, structure prediction and validation of tannase from Aspergillusniger N5-5. CHINESE CHEM LETT 2016. [DOI: 10.1016/j.cclet.2016.04.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
46
|
Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A. Coarse-Grained Protein Models and Their Applications. Chem Rev 2016; 116:7898-936. [DOI: 10.1021/acs.chemrev.6b00163] [Citation(s) in RCA: 555] [Impact Index Per Article: 69.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Sebastian Kmiecik
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Dominik Gront
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Michal Kolinski
- Bioinformatics
Laboratory, Mossakowski Medical Research Center of the Polish Academy of Sciences, Pawinskiego 5, 02-106 Warsaw, Poland
| | - Lukasz Wieteska
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
- Department
of Medical Biochemistry, Medical University of Lodz, Mazowiecka 6/8, 92-215 Lodz, Poland
| | | | - Andrzej Kolinski
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
47
|
Modi V, Dunbrack RL. Assessment of refinement of template-based models in CASP11. Proteins 2016; 84 Suppl 1:260-81. [PMID: 27081793 DOI: 10.1002/prot.25048] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2015] [Revised: 03/13/2016] [Accepted: 04/11/2016] [Indexed: 12/26/2022]
Abstract
CASP11 (the 11th Meeting on the Critical Assessment of Protein Structure Prediction) ran a blind experiment in the refinement of protein structure predictions, the fourth such experiment since CASP8. As with the previous experiments, the predictors were provided with one starting structure from the server models of each of a selected set of template-based modeling targets and asked to refine the coordinates of the starting structure toward native. We assessed the refined structures with the Z-scores of the standard CASP measures, which compare the model-target similarities of the models from all the predictors. Furthermore, we assessed the refined structures with "relative measures," which compare the improvement in accuracy of each model with respect to the starting structure. The latter provides an assessment of the extent to which each predictor group is able to improve the starting structures toward native. We utilized heat maps to display improvements in the Calpha-Calpha distance matrix for each model. The heat maps labeled with each element of secondary structure helped us to identify regions of refinement toward native in each model. Most positively scoring models show modest improvements in multiple regions of the structure, while in some models we were able to identify significant repositioning of N/C-terminal segments and internal elements of secondary structure. The best groups were able to improve more than 70% of the targets from the starting models, and by an average of 3-5% in the standard CASP measures. Proteins 2016; 84(Suppl 1):260-281. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Vivek Modi
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | |
Collapse
|
48
|
Hu J, Han K, Li Y, Yang JY, Shen HB, Yu DJ. TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM. Amino Acids 2016; 48:2533-2547. [DOI: 10.1007/s00726-016-2274-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 06/07/2016] [Indexed: 12/12/2022]
|
49
|
Bhattacharya D, Cao R, Cheng J. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 2016; 32:2791-9. [PMID: 27259540 PMCID: PMC5018369 DOI: 10.1093/bioinformatics/btw316] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/15/2016] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. RESULTS Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. AVAILABILITY AND IMPLEMENTATION Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/ CONTACT: chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jianlin Cheng
- Department of Computer Science Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
50
|
Figueroa M, Sleutel M, Vandevenne M, Parvizi G, Attout S, Jacquin O, Vandenameele J, Fischer AW, Damblon C, Goormaghtigh E, Valerio-Lepiniec M, Urvoas A, Durand D, Pardon E, Steyaert J, Minard P, Maes D, Meiler J, Matagne A, Martial JA, Van de Weerdt C. The unexpected structure of the designed protein Octarellin V.1 forms a challenge for protein structure prediction tools. J Struct Biol 2016; 195:19-30. [PMID: 27181418 DOI: 10.1016/j.jsb.2016.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Revised: 04/19/2016] [Accepted: 05/12/2016] [Indexed: 12/26/2022]
Abstract
Despite impressive successes in protein design, designing a well-folded protein of more 100 amino acids de novo remains a formidable challenge. Exploiting the promising biophysical features of the artificial protein Octarellin V, we improved this protein by directed evolution, thus creating a more stable and soluble protein: Octarellin V.1. Next, we obtained crystals of Octarellin V.1 in complex with crystallization chaperons and determined the tertiary structure. The experimental structure of Octarellin V.1 differs from its in silico design: the (αβα) sandwich architecture bears some resemblance to a Rossman-like fold instead of the intended TIM-barrel fold. This surprising result gave us a unique and attractive opportunity to test the state of the art in protein structure prediction, using this artificial protein free of any natural selection. We tested 13 automated webservers for protein structure prediction and found none of them to predict the actual structure. More than 50% of them predicted a TIM-barrel fold, i.e. the structure we set out to design more than 10years ago. In addition, local software runs that are human operated can sample a structure similar to the experimental one but fail in selecting it, suggesting that the scoring and ranking functions should be improved. We propose that artificial proteins could be used as tools to test the accuracy of protein structure prediction algorithms, because their lack of evolutionary pressure and unique sequences features.
Collapse
Affiliation(s)
- Maximiliano Figueroa
- GIGA-Research, Molecular Biomimetics and Protein Engineering, University of Liège, Liège, Belgium.
| | - Mike Sleutel
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Marylene Vandevenne
- GIGA-Research, Molecular Biomimetics and Protein Engineering, University of Liège, Liège, Belgium
| | - Gregory Parvizi
- GIGA-Research, Molecular Biomimetics and Protein Engineering, University of Liège, Liège, Belgium
| | - Sophie Attout
- GIGA-Research, Molecular Biomimetics and Protein Engineering, University of Liège, Liège, Belgium
| | - Olivier Jacquin
- GIGA-Research, Molecular Biomimetics and Protein Engineering, University of Liège, Liège, Belgium
| | - Julie Vandenameele
- Laboratoire d'Enzymologie et Repliement des Protéines, Centre for Protein Engineering, University of Liège, Liège, Belgium
| | - Axel W Fischer
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | | | - Erik Goormaghtigh
- Laboratory for the Structure and Function of Biological Membranes, Center for Structural Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
| | - Marie Valerio-Lepiniec
- Institute for Integrative Biology of the Cell (I2BC), UMT 9198, CEA, CNRS, Université Paris-Sud, Orsay, France
| | - Agathe Urvoas
- Institute for Integrative Biology of the Cell (I2BC), UMT 9198, CEA, CNRS, Université Paris-Sud, Orsay, France
| | - Dominique Durand
- Institute for Integrative Biology of the Cell (I2BC), UMT 9198, CEA, CNRS, Université Paris-Sud, Orsay, France
| | - Els Pardon
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium; Structural Biology Research Center, VIB, Pleinlaan 2, 1050 Brussels, Belgium
| | - Jan Steyaert
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium; Structural Biology Research Center, VIB, Pleinlaan 2, 1050 Brussels, Belgium
| | - Philippe Minard
- Institute for Integrative Biology of the Cell (I2BC), UMT 9198, CEA, CNRS, Université Paris-Sud, Orsay, France
| | - Dominique Maes
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Jens Meiler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - André Matagne
- Laboratoire d'Enzymologie et Repliement des Protéines, Centre for Protein Engineering, University of Liège, Liège, Belgium
| | - Joseph A Martial
- GIGA-Research, Molecular Biomimetics and Protein Engineering, University of Liège, Liège, Belgium
| | - Cécile Van de Weerdt
- GIGA-Research, Molecular Biomimetics and Protein Engineering, University of Liège, Liège, Belgium.
| |
Collapse
|