Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kim DE, Dimaio F, Yu-Ruei Wang R, Song Y, Baker D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 2013;82 Suppl 2:208-18. [PMID: 23900763 DOI: 10.1002/prot.24374] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Revised: 06/12/2013] [Accepted: 06/21/2013] [Indexed: 12/19/2022]

For:	Kim DE, Dimaio F, Yu-Ruei Wang R, Song Y, Baker D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 2013;82 Suppl 2:208-18. [PMID: 23900763 DOI: 10.1002/prot.24374] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Revised: 06/12/2013] [Accepted: 06/21/2013] [Indexed: 12/19/2022]

Number

Cited by Other Article(s)

Si Y, Zou J, Gao Y, Chuai G, Liu Q, Chen L. Foundation models in molecular biology. BIOPHYSICS REPORTS 2024;10:135-151. [PMID: 39027316 PMCID: PMC11252241 DOI: 10.52601/bpr.2024.240006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 03/04/2024] [Indexed: 07/20/2024] Open

Affiliation(s)

Yunda Si Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
Jiawei Zou Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
Yicheng Gao Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Guohui Chuai Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Qi Liu Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Luonan Chen Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China

Collapse

Fongang B, Wadop YN, Zhu Y, Wagner EJ, Kudlicki A, Rowicka M. Coevolution combined with molecular dynamics simulations provides structural and mechanistic insights into the interactions between the integrator complex subunits. Comput Struct Biotechnol J 2023;21:5686-5697. [PMID: 38074468 PMCID: PMC10700540 DOI: 10.1016/j.csbj.2023.11.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/10/2023] [Accepted: 11/10/2023] [Indexed: 01/18/2024] Open

Affiliation(s)

Bernard Fongang Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States Department of Biochemistry and Structural Biology, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States Department of Population Health Sciences, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
Yannick N. Wadop Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
Yingjie Zhu Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
Eric J. Wagner Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States Department of Biochemistry and Biophysics, The University of Rochester Medical Center, Rochester, NY, United States Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
Andrzej Kudlicki Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States Informatics Service Center, The University of Texas Medical Branch, Galveston, TX, United States
Maga Rowicka Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States

Collapse

Uzoeto HO, Cosmas S, Ajima JN, Arazu AV, Didiugwu CM, Ekpo DE, Ibiang GO, Durojaye OA. Computer-aided molecular modeling and structural analysis of the human centromere protein–HIKM complex. BENI-SUEF UNIVERSITY JOURNAL OF BASIC AND APPLIED SCIENCES 2022. [DOI: 10.1186/s43088-022-00285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract Abstract Background Protein–peptide and protein–protein interactions play an essential role in different functional and structural cellular organizational aspects. While Cryo-EM and X-ray crystallography generate the most complete structural characterization, most biological interactions exist in biomolecular complexes that are neither compliant nor responsive to direct experimental analysis. The development of computational docking approaches is therefore necessary. This starts from component protein structures to the prediction of their complexes, preferentially with precision close to complex structures generated by X-ray crystallography. Results To guarantee faithful chromosomal segregation, there must be a proper assembling of the kinetochore (a protein complex with multiple subunits) at the centromere during the process of cell division. As an important member of the inner kinetochore, defects in any of the subunits making up the CENP-HIKM complex lead to kinetochore dysfunction and an eventual chromosomal mis-segregation and cell death. Previous studies in an attempt to understand the assembly and mechanism devised by the CENP-HIKM in promoting the functionality of the kinetochore have reconstituted the protein complex from different organisms including fungi and yeast. Here, we present a detailed computational model of the physical interactions that exist between each component of the human CENP-HIKM, while validating each modeled structure using orthologs with existing crystal structures from the protein data bank. Conclusions Results from this study substantiate the existing hypothesis that the human CENP-HIK complex shares a similar architecture with its fungal and yeast orthologs, and likewise validate the binding mode of CENP-M to the C-terminus of the human CENP-I based on existing experimental reports. Graphical abstract Collapse

Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022;23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open

Kazan IC, Sharma P, Rahman MI, Bobkov A, Fromme R, Ghirlanda G, Ozkan SB. Design of novel cyanovirin-N variants by modulation of binding dynamics through distal mutations. eLife 2022;11:67474. [PMID: 36472898 PMCID: PMC9725752 DOI: 10.7554/elife.67474] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 11/28/2022] [Indexed: 12/07/2022] Open

Heo L, Janson G, Feig M. Physics-based protein structure refinement in the era of artificial intelligence. Proteins 2021;89:1870-1887. [PMID: 34156124 PMCID: PMC8616793 DOI: 10.1002/prot.26161] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/31/2021] [Accepted: 06/08/2021] [Indexed: 12/21/2022]

Gaalswyk K, Liu Z, Vogel HJ, MacCallum JL. An Integrative Approach to Determine 3D Protein Structures Using Sparse Paramagnetic NMR Data and Physical Modeling. Front Mol Biosci 2021;8:676268. [PMID: 34476238 PMCID: PMC8407082 DOI: 10.3389/fmolb.2021.676268] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 07/29/2021] [Indexed: 11/13/2022] Open

Reza MS, Zhang H, Hossain MT, Jin L, Feng S, Wei Y. COMTOP: Protein Residue-Residue Contact Prediction through Mixed Integer Linear Optimization. MEMBRANES 2021;11:membranes11070503. [PMID: 34209399 PMCID: PMC8305966 DOI: 10.3390/membranes11070503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]

Abstract

Protein contact prediction helps reconstruct the tertiary structure that greatly determines a protein’s function; therefore, contact prediction from the sequence is an important problem. Recently there has been exciting progress on this problem, but many of the existing methods are still low quality of prediction accuracy. In this paper, we present a new mixed integer linear programming (MILP)-based consensus method: a Consensus scheme based On a Mixed integer linear opTimization method for prOtein contact Prediction (COMTOP). The MILP-based consensus method combines the strengths of seven selected protein contact prediction methods, including CCMpred, EVfold, DeepCov, NNcon, PconsC4, plmDCA, and PSICOV, by optimizing the number of correctly predicted contacts and achieving a better prediction accuracy. The proposed hybrid protein residue–residue contact prediction scheme was tested in four independent test sets. For 239 highly non-redundant proteins, the method showed a prediction accuracy of 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively. When tested on the CASP13 and CASP14 test sets, the proposed method obtained accuracies of 75.91% and 77.49% for top-L/5 predictions, respectively. COMTOP was further tested on 57 non-redundant α-helical transmembrane proteins and achieved prediction accuracies of 64.34% and 73.91% for top-L/2 and top-L/5 predictions, respectively. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For example, COMTOP performed much better for large number of contact predictions (such as top-5L and top-3L) than for small number of contact predictions such as top-L/2 and top-L/5. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.

Collapse

Affiliation(s)

Md. Selim Reza School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.) Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
Huiling Zhang School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.) Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
Md. Tofazzal Hossain School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.) Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
Langxi Jin Department of Computer Science and Technology, School of Computer Science and Technology, Harbin University of Science and Technology, 52 Xuefu Road, Nangang District, Harbin 150080, China;
Shengzhong Feng Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
Yanjie Wei School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.) Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Correspondence:

Collapse

Bottino GF, Ferrari AJR, Gozzo FC, Martínez L. Structural discrimination analysis for constraint selection in protein modeling. Bioinformatics 2021;37:3766-3773. [PMID: 34086840 DOI: 10.1093/bioinformatics/btab425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/03/2021] [Indexed: 11/12/2022] Open

Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int J Mol Sci 2021;22:6032. [PMID: 34199677 PMCID: PMC8199773 DOI: 10.3390/ijms22116032] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 05/29/2021] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open

Karimi M, Zhu S, Cao Y, Shen Y. De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks. J Chem Inf Model 2020;60:5667-5681. [PMID: 32945673 PMCID: PMC7775287 DOI: 10.1021/acs.jcim.0c00593] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Abstract

Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the conditional input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to guide model training, and (3) exploiting sequence data with and without paired structures to enable a semisupervised training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.

Collapse

Shao D, Mao W, Xing Y, Gong H. RDb2C2: an improved method to identify the residue-residue pairing in β strands. BMC Bioinformatics 2020;21:133. [PMID: 32245403 PMCID: PMC7126467 DOI: 10.1186/s12859-020-3476-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 03/31/2020] [Indexed: 11/17/2022] Open

Zhang Q, Zhu J, Ju F, Kong L, Sun S, Zheng WM, Bu D. ISSEC: inferring contacts among protein secondary structure elements using deep object detection. BMC Bioinformatics 2020;21:503. [PMID: 33153432 PMCID: PMC7643357 DOI: 10.1186/s12859-020-03793-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 09/30/2020] [Indexed: 11/12/2022] Open

Abstract

BACKGROUND

The formation of contacts among protein secondary structure elements (SSEs) is an important step in protein folding as it determines topology of protein tertiary structure; hence, inferring inter-SSE contacts is crucial to protein structure prediction. One of the existing strategies infers inter-SSE contacts directly from the predicted possibilities of inter-residue contacts without any preprocessing, and thus suffers from the excessive noises existing in the predicted inter-residue contacts. Another strategy defines SSEs based on protein secondary structure prediction first, and then judges whether each candidate SSE pair could form contact or not. However, it is difficult to accurately determine boundary of SSEs due to the errors in secondary structure prediction. The incorrectly-deduced SSEs definitely hinder subsequent prediction of the contacts among them.

RESULTS

We here report an accurate approach to infer the inter-SSE contacts (thus called as ISSEC) using the deep object detection technique. The design of ISSEC is based on the observation that, in the inter-residue contact map, the contacting SSEs usually form rectangle regions with characteristic patterns. Therefore, ISSEC infers inter-SSE contacts through detecting such rectangle regions. Unlike the existing approach directly using the predicted probabilities of inter-residue contact, ISSEC applies the deep convolution technique to extract high-level features from the inter-residue contacts. More importantly, ISSEC does not rely on the pre-defined SSEs. Instead, ISSEC enumerates multiple candidate rectangle regions in the predicted inter-residue contact map, and for each region, ISSEC calculates a confidence score to measure whether it has characteristic patterns or not. ISSEC employs greedy strategy to select non-overlapping regions with high confidence score, and finally infers inter-SSE contacts according to these regions.

CONCLUSIONS

Comprehensive experimental results suggested that ISSEC outperformed the state-of-the-art approaches in predicting inter-SSE contacts. We further demonstrated the successful applications of ISSEC to improve prediction of both inter-residue contacts and tertiary structure as well.

Collapse

Li Y, Mohanty S, Nilsson D, Hansson B, Mao K, Irbäck A. When a foreign gene meets its native counterpart: computational biophysics analysis of two PgiC loci in the grass Festuca ovina. Sci Rep 2020;10:18752. [PMID: 33127989 PMCID: PMC7599235 DOI: 10.1038/s41598-020-75650-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/16/2020] [Indexed: 11/14/2022] Open

Farrell DP, Anishchenko I, Shakeel S, Lauko A, Passmore LA, Baker D, DiMaio F. Deep learning enables the atomic structure determination of the Fanconi Anemia core complex from cryoEM. IUCRJ 2020;7:881-892. [PMID: 32939280 PMCID: PMC7467173 DOI: 10.1107/s2052252520009306] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Accepted: 07/07/2020] [Indexed: 06/11/2023]

Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020;18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open

Jing X, Zeng H, Wang S, Xu J. A Web-Based Protocol for Interprotein Contact Prediction by Deep Learning. Methods Mol Biol 2020;2074:67-80. [PMID: 31583631 DOI: 10.1007/978-1-4939-9873-9_6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Rao R, Bhattacharya N, Thomas N, Duan Y, Chen X, Canny J, Abbeel P, Song YS. Evaluating Protein Transfer Learning with TAPE. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2019;32:9689-9701. [PMID: 33390682 PMCID: PMC7774645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Heo L, Feig M. High-accuracy protein structures by combining machine-learning with physics-based refinement. Proteins 2019;88:637-642. [PMID: 31693199 DOI: 10.1002/prot.25847] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 10/05/2019] [Accepted: 11/03/2019] [Indexed: 12/16/2022]

Shi C, Chen J, Kang X, Zhao G, Lao X, Zheng H. Deep Learning in the Study of Protein-Related Interactions. Protein Pept Lett 2019;27:359-369. [PMID: 31538879 DOI: 10.2174/0929866526666190723114142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Revised: 03/13/2019] [Accepted: 04/05/2019] [Indexed: 11/22/2022]

Mack EA, Xiao YP, Allred DR. Knockout of Babesia bovis rad51 ortholog and its complementation by expression from the BbACc3 artificial chromosome platform. PLoS One 2019;14:e0215882. [PMID: 31386669 PMCID: PMC6684078 DOI: 10.1371/journal.pone.0215882] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 07/21/2019] [Indexed: 11/18/2022] Open

Abstract

Babesia bovis establishes persistent infections of long duration in cattle, despite the development of effective anti-disease immunity. One mechanism used by the parasite to achieve persistence is rapid antigenic variation of the VESA1 cytoadhesion ligand through segmental gene conversion (SGC), a phenomenon thought to be a form of homologous recombination (HR). To begin investigation of the enzymatic basis for SGC we initially identified and knocked out the Bbrad51 gene encoding the B. bovis Rad51 ortholog. BbRad51 was found to be non-essential for in vitro growth of asexual-stage parasites. However, its loss resulted in hypersensitivity to methylmethane sulfonate (MMS) and an apparent defect in HR. This defect rendered attempts to complement the knockout phenotype by reinsertion of the Bbrad51 gene into the genome unsuccessful. To circumvent this difficulty, we constructed an artificial chromosome, BbACc3, into which the complete Bbrad51 locus was inserted, for expression of BbRad51 under regulation by autologous elements. Maintenance of BbACc3 makes use of centromeric sequences from chromosome 3 and telomeric ends from chromosome 1 of the B. bovis C9.1 line. A selection cassette employing human dihydrofolate reductase enables recovery of transformants by selection with pyrimethamine. We demonstrate that the BbACc3 platform is stably maintained once established, assembles nucleosomes to form native chromatin, and expands in telomere length over time. Significantly, the MMS-sensitivity phenotype observed in the absence of Bbrad51 was successfully complemented at essentially normal levels. We provide cautionary evidence, however, that in HR-competent parasites BbACc3 can recombine with native chromosomes, potentially resulting in crossover. We propose that, under certain circumstances this platform can provide a useful alternative for the genetic manipulation of this group of parasites, particularly when regulated gene expression under the control of autologous elements may be important.

Collapse

Fongang B, Cunningham KA, Rowicka M, Kudlicki A. Coevolution of Residues Provides Evidence of a Functional Heterodimer of 5-HT_2AR and 5-HT_2CR Involving Both Intracellular and Extracellular Domains. Neuroscience 2019;412:48-59. [PMID: 31158438 PMCID: PMC7299066 DOI: 10.1016/j.neuroscience.2019.05.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 05/02/2019] [Accepted: 05/07/2019] [Indexed: 10/26/2022]

Marks C, Deane CM. Increasing the accuracy of protein loop structure prediction with evolutionary constraints. Bioinformatics 2019;35:2585-2592. [PMID: 30535347 DOI: 10.1093/bioinformatics/bty996] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/28/2018] [Accepted: 12/07/2018] [Indexed: 11/12/2022] Open

Kuenze G, Meiler J. Protein structure prediction using sparse NOE and RDC restraints with Rosetta in CASP13. Proteins 2019;87:1341-1350. [PMID: 31292988 DOI: 10.1002/prot.25769] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 05/25/2019] [Accepted: 07/06/2019] [Indexed: 12/30/2022]

Heo L, Arbour CF, Feig M. Driven to near-experimental accuracy by refinement via molecular dynamics simulations. Proteins 2019;87:1263-1275. [PMID: 31197841 DOI: 10.1002/prot.25759] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 06/01/2019] [Accepted: 06/07/2019] [Indexed: 12/17/2022]

Wu Q, Peng Z, Anishchenko I, Cong Q, Baker D, Yang J. Protein contact prediction using metagenome sequence data and residual neural networks. Bioinformatics 2019;36:41-48. [PMID: 31173061 PMCID: PMC8792440 DOI: 10.1093/bioinformatics/btz477] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 05/30/2019] [Accepted: 06/04/2019] [Indexed: 01/31/2023] Open

Machine learning-assisted directed protein evolution with combinatorial libraries. Proc Natl Acad Sci U S A 2019;116:8852-8858. [PMID: 30979809 DOI: 10.1073/pnas.1901979116] [Citation(s) in RCA: 287] [Impact Index Per Article: 57.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep 2019;9:3514. [PMID: 30837676 PMCID: PMC6401133 DOI: 10.1038/s41598-019-40314-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022] Open

Malinverni D, Barducci A. Coevolutionary Analysis of Protein Sequences for Molecular Modeling. Methods Mol Biol 2019;2022:379-397. [PMID: 31396912 DOI: 10.1007/978-1-4939-9608-7_16] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018;14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open

Abstract

The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures.

Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.

Collapse

Jones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 2018;34:3308-3315. [PMID: 29718112 PMCID: PMC6157083 DOI: 10.1093/bioinformatics/bty341] [Citation(s) in RCA: 112] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 03/06/2018] [Accepted: 04/25/2018] [Indexed: 12/22/2022] Open

Abstract

Motivation

In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation.

Results

Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions.

Availability and implementation

DeepCov is freely available at https://github.com/psipred/DeepCov.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Kc DB. Recent advances in sequence-based protein structure prediction. Brief Bioinform 2018;18:1021-1032. [PMID: 27562963 DOI: 10.1093/bib/bbw070] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Indexed: 11/13/2022] Open

de Oliveira SHP, Shi J, Deane CM. Comparing co-evolution methods and their application to template-free protein structure prediction. Bioinformatics 2018;33:373-381. [PMID: 28171606 PMCID: PMC5860252 DOI: 10.1093/bioinformatics/btw618] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Revised: 09/19/2016] [Accepted: 09/22/2016] [Indexed: 02/01/2023] Open

Holland J, Pan Q, Grigoryan G. Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials. PLoS One 2018;13:e0199585. [PMID: 29953468 PMCID: PMC6023208 DOI: 10.1371/journal.pone.0199585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/11/2018] [Indexed: 11/18/2022] Open

Mao W, Wang T, Zhang W, Gong H. Identification of residue pairing in interacting β-strands from a predicted residue contact map. BMC Bioinformatics 2018;19:146. [PMID: 29673311 PMCID: PMC5907701 DOI: 10.1186/s12859-018-2150-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 04/09/2018] [Indexed: 12/04/2022] Open

Abstract

Background

Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β-β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we propose a novel ridge-detection-based β-β contact predictor to identify residue pairing in β strands from any predicted residue contact map.

Results

Our algorithm RDb₂C adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb₂C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~ 62% and ~ 76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb₂C achieves impressively higher performance, with F1-scores reaching ~ 76% and ~ 86% at the residue level and strand level, respectively. In a test of structural modeling using the top 1 L predicted contacts as constraints, for 61 mainly β proteins, the average TM-score achieves 0.442 when using the raw RaptorX-Contact prediction, but increases to 0.506 when using the improved prediction by RDb₂C.

Conclusion

Our method can significantly improve the prediction of β-β contacts from any predicted residue contact maps. Prediction results of our algorithm could be directly applied to effectively facilitate the practical structure prediction of mainly β proteins.

Availability

All source data and codes are available at http://166.111.152.91/Downloads.html or the GitHub address of https://github.com/wzmao/RDb2C.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2150-1) contains supplementary material, which is available to authorized users.

Collapse

Gaalswyk K, Muniyat MI, MacCallum JL. The emerging role of physical modeling in the future of structure determination. Curr Opin Struct Biol 2018;49:145-153. [DOI: 10.1016/j.sbi.2018.03.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Revised: 03/04/2018] [Accepted: 03/05/2018] [Indexed: 10/17/2022]

de Oliveira SHP, Law EC, Shi J, Deane CM. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction. Bioinformatics 2018;34:1132-1140. [PMID: 29136098 PMCID: PMC6030820 DOI: 10.1093/bioinformatics/btx722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 09/22/2017] [Accepted: 11/04/2017] [Indexed: 01/12/2023] Open

Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 2018;86 Suppl 1:51-66. [PMID: 29071738 PMCID: PMC5820169 DOI: 10.1002/prot.25407] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022]

Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018;53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]

Liu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks. Cell Syst 2018;6:65-74.e3. [PMID: 29275173 PMCID: PMC5808454 DOI: 10.1016/j.cels.2017.11.014] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 10/04/2017] [Accepted: 11/22/2017] [Indexed: 12/21/2022]

Mandalaparthy V, Sanaboyana VR, Rafalia H, Gosavi S. Exploring the effects of sparse restraints on protein structure prediction. Proteins 2017;86:248-262. [DOI: 10.1002/prot.25438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 11/20/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023]

Lee GR, Heo L, Seok C. Simultaneous refinement of inaccurate local regions and overall structure in the CASP12 protein model refinement experiment. Proteins 2017;86 Suppl 1:168-176. [PMID: 29044810 DOI: 10.1002/prot.25404] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Revised: 10/09/2017] [Accepted: 10/11/2017] [Indexed: 12/15/2022]

Higgins SA, Savage DF. Protein Science by DNA Sequencing: How Advances in Molecular Biology Are Accelerating Biochemistry. Biochemistry 2017;57:38-46. [DOI: 10.1021/acs.biochem.7b00886] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Wang S, Sun S, Xu J. Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 2017;86 Suppl 1:67-77. [PMID: 28845538 DOI: 10.1002/prot.25377] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Revised: 08/18/2017] [Accepted: 08/25/2017] [Indexed: 11/08/2022]

Zhu J, Zhang H, Li SC, Wang C, Kong L, Sun S, Zheng WM, Bu D. Improving protein fold recognition by extracting fold-specific features from predicted residue–residue contacts. Bioinformatics 2017;33:3749-3757. [DOI: 10.1093/bioinformatics/btx514] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 08/09/2017] [Indexed: 01/05/2023] Open

Ovchinnikov S, Park H, Varghese N, Huang PS, Pavlopoulos GA, Kim DE, Kamisetty H, Kyrpides NC, Baker D. Protein structure determination using metagenome sequence data. Science 2017;355:294-298. [PMID: 28104891 PMCID: PMC5493203 DOI: 10.1126/science.aah4043] [Citation(s) in RCA: 331] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 11/22/2016] [Indexed: 01/30/2023]

Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol 2017;13:e1005324. [PMID: 28056090 PMCID: PMC5249242 DOI: 10.1371/journal.pcbi.1005324] [Citation(s) in RCA: 559] [Impact Index Per Article: 79.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 01/20/2017] [Accepted: 12/20/2016] [Indexed: 12/02/2022] Open

Abstract

Motivation

Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction.

Method

This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question.

Results

Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then.

Availability

http://raptorx.uchicago.edu/ContactMap/

Protein contact prediction and contact-assisted folding has made good progress due to direct evolutionary coupling analysis (DCA). However, DCA is effective on only some proteins with a very large number of sequence homologs. To further improve contact prediction, we borrow ideas from deep learning, which has recently revolutionized object recognition, speech recognition and the GO game. Our deep learning method can model complex sequence-structure relationship and high-order correlation (i.e., contact occurrence patterns) and thus, improve contact prediction accuracy greatly. Our test results show that our method greatly outperforms the state-of-the-art methods regardless how many sequence homologs are available for a protein in question. Ab initio folding guided by our predicted contacts may fold many more test proteins than the other contact predictors. Our contact-assisted 3D models also have much better quality than homology models built from the training proteins, especially for membrane proteins. One interesting finding is that even trained mostly with soluble proteins, our method performs very well on membrane proteins. Recent blind CAMEO test confirms that our method can fold large proteins with a new fold and only a small number of sequence homologs.

Collapse

DiMaio F. Rosetta Structure Prediction as a Tool for Solving Difficult Molecular Replacement Problems. Methods Mol Biol 2017;1607:455-466. [PMID: 28573585 DOI: 10.1007/978-1-4939-7000-1_19] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Adhikari B, Nowotny J, Bhattacharya D, Hou J, Cheng J. ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinformatics 2016;17:517. [PMID: 27923350 PMCID: PMC5142288 DOI: 10.1186/s12859-016-1404-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 12/01/2016] [Indexed: 12/31/2022] Open