Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. Fold Des 1998;2:295-306. [PMID: 9377713 DOI: 10.1016/s1359-0278(97)00041-2] [Citation(s) in RCA: 196] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Number

Cited by Other Article(s)

Gupta A, Ma H, Ramanathan A, Zerze GH. A Deep Learning-Driven Sampling Technique to Explore the Phase Space of an RNA Stem-Loop. J Chem Theory Comput 2024. [PMID: 39374435 DOI: 10.1021/acs.jctc.4c00669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]

Abstract

The folding and unfolding of RNA stem-loops are critical biological processes; however, their computational studies are often hampered by the ruggedness of their folding landscape, necessitating long simulation times at the atomistic scale. Here, we adapted DeepDriveMD (DDMD), an advanced deep learning-driven sampling technique originally developed for protein folding, to address the challenges of RNA stem-loop folding. Although tempering- and order parameter-based techniques are commonly used for similar rare-event problems, the computational costs or the need for a priori knowledge about the system often present a challenge in their effective use. DDMD overcomes these challenges by adaptively learning from an ensemble of running MD simulations using generic contact maps as the raw input. DeepDriveMD enables on-the-fly learning of a low-dimensional latent representation and guides the simulation toward the undersampled regions while optimizing the resources to explore the relevant parts of the phase space. We showed that DDMD estimates the free energy landscape of the RNA stem-loop reasonably well at room temperature. Our simulation framework runs at a constant temperature without external biasing potential, hence preserving the information on transition rates, with a computational cost much lower than that of the simulations performed with external biasing potentials. We also introduced a reweighting strategy for obtaining unbiased free energy surfaces and presented a qualitative analysis of the latent space. This analysis showed that the latent space captures the relevant slow degrees of freedom for the RNA folding problem of interest. Finally, throughout the manuscript, we outlined how different parameters are selected and optimized to adapt DDMD for this system. We believe this compendium of decision-making processes will help new users adapt this technique for the rare-event sampling problems of their interest.

Collapse

Fakhoury Z, Sosso GC, Habershon S. Contact-Map-Driven Exploration of Heterogeneous Protein-Folding Paths. J Chem Theory Comput 2024;20. [PMID: 39228261 PMCID: PMC11428170 DOI: 10.1021/acs.jctc.4c00878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/19/2024] [Accepted: 08/22/2024] [Indexed: 09/05/2024]

Yu Z, Yu J, Wang H, Zhang S, Zhao L, Shi S. PhosAF: An integrated deep learning architecture for predicting protein phosphorylation sites with AlphaFold2 predicted structures. Anal Biochem 2024;690:115510. [PMID: 38513769 DOI: 10.1016/j.ab.2024.115510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 03/14/2024] [Accepted: 03/18/2024] [Indexed: 03/23/2024]

Chu H, Liu T. Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models. Int J Mol Sci 2024;25:4507. [PMID: 38674091 PMCID: PMC11049818 DOI: 10.3390/ijms25084507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/15/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open

Busia A, Listgarten J. MBE: model-based enrichment estimation and prediction for differential sequencing data. Genome Biol 2023;24:218. [PMID: 37784130 PMCID: PMC10544408 DOI: 10.1186/s13059-023-03058-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/14/2023] [Indexed: 10/04/2023] Open

Ponlachantra K, Suginta W, Robinson RC, Kitaoku Y. AlphaFold2: A versatile tool to predict the appearance of functional adaptations in evolution: Profilin interactions in uncultured Asgard archaea: Profilin interactions in uncultured Asgard archaea. Bioessays 2023;45:e2200119. [PMID: 36461738 DOI: 10.1002/bies.202200119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 11/07/2022] [Accepted: 11/09/2022] [Indexed: 12/05/2022]

Liu H, Chen Q. Computational protein design with data‐driven approaches: Recent developments and perspectives. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Sánchez IE, Galpern EA, Garibaldi MM, Ferreiro DU. Molecular Information Theory Meets Protein Folding. J Phys Chem B 2022;126:8655-8668. [PMID: 36282961 DOI: 10.1021/acs.jpcb.2c04532] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Chou HH, Hsu CT, Hsu CW, Yao KH, Wang HC, Hsieh SY. Novel Algorithm for Improved Protein Classification Using Graph Similarity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:3135-3143. [PMID: 34748498 DOI: 10.1109/tcbb.2021.3125836] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Tichkule S, Myung Y, Naung MT, Ansell BRE, Guy AJ, Srivastava N, Mehra S, Cacciò SM, Mueller I, Barry AE, van Oosterhout C, Pope B, Ascher DB, Jex AR. VIVID: a web application for variant interpretation and visualisation in multidimensional analyses. Mol Biol Evol 2022;39:6697981. [PMID: 36103257 PMCID: PMC9514033 DOI: 10.1093/molbev/msac196] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Affiliation(s)

Swapnil Tichkule Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia Department of Medical Biology, University of Melbourne , Melbourne , Australia
Yoochan Myung Systems and Computational Biology, Bio21 Institute, University of Melbourne , Melbourne , Australia Computational Biology and Clinical Informatics, Baker Heart and Diabetes , Melbourne , Australia
Myo T Naung Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia Department of Medical Biology, University of Melbourne , Melbourne , Australia
Brendan R E Ansell Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
Andrew J Guy School of Science, RMIT University , Melbourne , Australia
Namrata Srivastava Department of Data Science and AI, Monash University , Melbourne , Australia
Somya Mehra Life Sciences Discipline, Burnet Institute , Melbourne , Australia
Simone M Cacciò Department of Infectious Disease, Istituto Superiore di Sanità , Rome , Italy
Ivo Mueller Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
Alyssa E Barry Life Sciences Discipline, Burnet Institute , Melbourne , Australia Institute of Mental and Physical Health and Clinical Translation (IMPACT) and School of Medicine, Deakin University , Geelong , Australia
Cock van Oosterhout School of Environmental Sciences, University of East Anglia, Norwich Research Park , Norwich , UK
Bernard Pope Melbourne Bioinformatics, University of Melbourne , Melbourne , Australia Australian BioCommons , Sydney , Australia Department of Clinical Pathology, University of Melbourne , Melbourne , Australia Department of Surgery (Royal Melbourne Hospital), University of Melbourne , Melbourne , Australia
David B Ascher Systems and Computational Biology, Bio21 Institute, University of Melbourne , Melbourne , Australia Computational Biology and Clinical Informatics, Baker Heart and Diabetes , Melbourne , Australia
Aaron R Jex Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia Faculty of Veterinary and Agricultural Sciences, University of Melbourne , Melbourne , Australia

Collapse

Erman B. Gaussian network model revisited: Effects of mutation and ligand binding on protein behavior. Phys Biol 2022;19. [PMID: 35105836 DOI: 10.1088/1478-3975/ac50ba] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 02/01/2022] [Indexed: 11/12/2022]

Ovchinnikov S, Huang PS. Structure-based protein design with deep learning. Curr Opin Chem Biol 2021;65:136-144. [PMID: 34547592 PMCID: PMC8671290 DOI: 10.1016/j.cbpa.2021.08.004] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]

Guo X, Du Y, Tadepalli S, Zhao L, Shehu A. Generating tertiary protein structures via interpretable graph variational autoencoders. BIOINFORMATICS ADVANCES 2021;1:vbab036. [PMID: 36700110 PMCID: PMC9710582 DOI: 10.1093/bioadv/vbab036] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 11/07/2021] [Accepted: 11/17/2021] [Indexed: 01/28/2023]

Abstract

Motivation

Modeling the structural plasticity of protein molecules remains challenging. Most research has focused on obtaining one biologically active structure. This includes the recent AlphaFold2 that has been hailed as a breakthrough for protein modeling. Computing one structure does not suffice to understand how proteins modulate their interactions and even evade our immune system. Revealing the structure space available to a protein remains challenging. Data-driven approaches that learn to generate tertiary structures are increasingly garnering attention. These approaches exploit the ability to represent tertiary structures as contact or distance maps and make direct analogies with images to harness convolution-based generative adversarial frameworks from computer vision. Since such opportunistic analogies do not allow capturing highly structured data, current deep models struggle to generate physically realistic tertiary structures.

Results

We present novel deep generative models that build upon the graph variational autoencoder framework. In contrast to existing literature, we represent tertiary structures as 'contact' graphs, which allow us to leverage graph-generative deep learning. Our models are able to capture rich, local and distal constraints and additionally compute disentangled latent representations that reveal the impact of individual latent factors. This elucidates what the factors control and makes our models more interpretable. Rigorous comparative evaluation along various metrics shows that the models, we propose advance the state-of-the-art. While there is still much ground to cover, the work presented here is an important first step, and graph-generative frameworks promise to get us to our goal of unraveling the exquisite structural complexity of protein molecules.

Availability and implementation

Code is available at https://github.com/anonymous1025/CO-VAE.

Supplementary information

Supplementary data are available at Bioinformatics Advances online.

Collapse

Zheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. CELL REPORTS METHODS 2021;1:100014. [PMID: 34355210 PMCID: PMC8336924 DOI: 10.1016/j.crmeth.2021.100014] [Citation(s) in RCA: 259] [Impact Index Per Article: 86.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/22/2021] [Accepted: 05/03/2021] [Indexed: 12/23/2022]

Pakhrin SC, Shrestha B, Adhikari B, KC DB. Deep Learning-Based Advances in Protein Structure Prediction. Int J Mol Sci 2021;22:5553. [PMID: 34074028 PMCID: PMC8197379 DOI: 10.3390/ijms22115553] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/12/2021] [Accepted: 05/18/2021] [Indexed: 12/29/2022] Open

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol 2021;17:e1008865. [PMID: 33770072 PMCID: PMC8026059 DOI: 10.1371/journal.pcbi.1008865] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/07/2021] [Accepted: 03/10/2021] [Indexed: 12/24/2022] Open

Abstract

The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.

Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate competitive performance of the proposed methods to other top approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training.

Collapse

Hu J, Yang W, Dong R, Li Y, Li X, Li S, Siriwardane EMD. Contact map based crystal structure prediction using global optimization. CrystEngComm 2021. [DOI: 10.1039/d0ce01714k] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Hu J, Yang W, Dilanga Siriwardane EM. Distance Matrix-Based Crystal Structure Prediction Using Evolutionary Algorithms. J Phys Chem A 2020;124:10909-10919. [DOI: 10.1021/acs.jpca.0c08775] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Xia CQ, Pan X, Shen HB. Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data. Bioinformatics 2020;36:3018-3027. [PMID: 32091580 DOI: 10.1093/bioinformatics/btaa110] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 01/19/2020] [Accepted: 02/18/2020] [Indexed: 01/02/2023] Open

Acharya V, Chakraborty HJ, Rout AK, Balabantaray S, Behera BK, Das BK. Structural Characterization of Open Reading Frame-Encoded Functional Genes from Tilapia Lake Virus (TiLV). Mol Biotechnol 2020;61:945-957. [PMID: 31664705 DOI: 10.1007/s12033-019-00217-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020;18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open

Wozniak PP, Pelc J, Skrzypecki M, Vriend G, Kotulska M. Bio-knowledge-based filters improve residue-residue contact prediction accuracy. Bioinformatics 2019;34:3675-3683. [PMID: 29850768 DOI: 10.1093/bioinformatics/bty416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 05/19/2018] [Indexed: 11/13/2022] Open

Wu Q, Peng Z, Anishchenko I, Cong Q, Baker D, Yang J. Protein contact prediction using metagenome sequence data and residual neural networks. Bioinformatics 2019;36:41-48. [PMID: 31173061 PMCID: PMC8792440 DOI: 10.1093/bioinformatics/btz477] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 05/30/2019] [Accepted: 06/04/2019] [Indexed: 01/31/2023] Open

Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 2019;19:219-230. [PMID: 27802931 DOI: 10.1093/bib/bbw106] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 11/14/2022] Open

Bhowmik D, Gao S, Young MT, Ramanathan A. Deep clustering of protein folding simulations. BMC Bioinformatics 2018;19:484. [PMID: 30577777 PMCID: PMC6302667 DOI: 10.1186/s12859-018-2507-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Amala A, Emerson IA. Understanding contact patterns of protein structures from protein contact map and investigation of unique patterns in the globin-like folded domains. J Cell Biochem 2018;120:9877-9886. [PMID: 30525229 DOI: 10.1002/jcb.28270] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 10/24/2018] [Indexed: 11/06/2022]

Baldi P. Deep Learning in Biomedical Data Science. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013343] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Kurczynska M, Kotulska M. Automated method to differentiate between native and mirror protein models obtained from contact maps. PLoS One 2018;13:e0196993. [PMID: 29787567 PMCID: PMC5963800 DOI: 10.1371/journal.pone.0196993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 04/24/2018] [Indexed: 11/23/2022] Open

Abstract

Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters–the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.

Collapse

He B, Mortuza SM, Wang Y, Shen HB, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics 2018;33:2296-2306. [PMID: 28369334 DOI: 10.1093/bioinformatics/btx164] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/21/2017] [Indexed: 12/12/2022] Open

Abstract

Motivation

Recent CASP experiments have witnessed exciting progress on folding large-size non-humongous proteins with the assistance of co-evolution based contact predictions. The success is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. Development of efficient methods that can generate balanced and reliable contact maps for different type of protein targets is essential to enhance the success rate of the ab initio protein structure prediction.

Results

We developed a new pipeline, NeBcon, which uses the naïve Bayes classifier (NBC) theorem to combine eight state of the art contact methods that are built from co-evolution and machine learning approaches. The posterior probabilities of the NBC model are then trained with intrinsic structural features through neural network learning for the final contact map prediction. NeBcon was tested on 98 non-redundant proteins, which improves the accuracy of the best co-evolution based meta-server predictor by 22%; the magnitude of the improvement increases to 45% for the hard targets that lack sequence and structural homologs in the databases. Detailed data analysis showed that the major contribution to the improvement is due to the optimized NBC combination of the complementary information from both co-evolution and machine learning predictions. The neural network training also helps to improve the coupling of the NBC posterior probability and the intrinsic structural features, which were found particularly important for the proteins that do not have sufficient number of homologous sequences to derive reliable co-evolution profiles.

Availiablity and Implementation

On-line server and standalone package of the program are available at http://zhanglab.ccmb.med.umich.edu/NeBcon/ .

Contact

zhng@umich.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Guo QH, Sun LH. Combinatorics of Contacts in Protein Contact Maps. Bull Math Biol 2017;80:385-403. [PMID: 29230703 DOI: 10.1007/s11538-017-0380-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 12/05/2017] [Indexed: 10/18/2022]

Wozniak PP, Konopka BM, Xu J, Vriend G, Kotulska M. Forecasting residue-residue contact prediction accuracy. Bioinformatics 2017;33:3405-3414. [PMID: 29036497 DOI: 10.1093/bioinformatics/btx416] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 06/22/2017] [Indexed: 11/14/2022] Open

Adhikari B, Cheng J. Improved protein structure reconstruction using secondary structures, contacts at higher distance thresholds, and non-contacts. BMC Bioinformatics 2017;18:380. [PMID: 28851269 PMCID: PMC5576353 DOI: 10.1186/s12859-017-1807-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Accepted: 08/22/2017] [Indexed: 11/12/2022] Open

Abstract

Background

Residue-residue contacts are key features for accurate de novo protein structure prediction. For the optimal utilization of these predicted contacts in folding proteins accurately, it is important to study the challenges of reconstructing protein structures using true contacts. Because contact-guided protein modeling approach is valuable for predicting the folds of proteins that do not have structural templates, it is necessary for reconstruction studies to focus on hard-to-predict protein structures.

Results

Using a data set consisting of 496 structural domains released in recent CASP experiments and a dataset of 150 representative protein structures, in this work, we discuss three techniques to improve the reconstruction accuracy using true contacts – adding secondary structures, increasing contact distance thresholds, and adding non-contacts. We find that reconstruction using secondary structures and contacts can deliver accuracy higher than using full contact maps. Similarly, we demonstrate that non-contacts can improve reconstruction accuracy not only when the used non-contacts are true but also when they are predicted. On the dataset consisting of 150 proteins, we find that by simply using low ranked predicted contacts as non-contacts and adding them as additional restraints, can increase the reconstruction accuracy by 5% when the reconstructed models are evaluated using TM-score.

Conclusions

Our findings suggest that secondary structures are invaluable companions of contacts for accurate reconstruction. Confirming some earlier findings, we also find that larger distance thresholds are useful for folding many protein structures which cannot be folded using the standard definition of contacts. Our findings also suggest that for more accurate reconstruction using predicted contacts it is useful to predict contacts at higher distance thresholds (beyond 8 Å) and predict non-contacts.

Electronic supplementary material

The online version of this article (10.1186/s12859-017-1807-5) contains supplementary material, which is available to authorized users.

Collapse

Ochoa-Montaño B, Blundell TL. XSuLT: a web server for structural annotation and representation of sequence-structure alignments. Nucleic Acids Res 2017;45:W381-W387. [PMID: 28510698 PMCID: PMC5793734 DOI: 10.1093/nar/gkx421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 05/04/2017] [Indexed: 12/16/2022] Open

Stahl K, Schneider M, Brock O. EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction. BMC Bioinformatics 2017;18:303. [PMID: 28623886 PMCID: PMC5474060 DOI: 10.1186/s12859-017-1713-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/30/2017] [Indexed: 01/12/2023] Open

Abstract

BACKGROUND

Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks.

RESULTS

On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements.

CONCLUSIONS

Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/.

Collapse

Chu JW, Yang H. Identifying the structural and kinetic elements in protein large-amplitude conformational motions. INT REV PHYS CHEM 2017. [DOI: 10.1080/0144235x.2017.1283885] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Wozniak PP, Vriend G, Kotulska M. Correlated mutations select misfolded from properly folded proteins. Bioinformatics 2017;33:1497-1504. [PMID: 28203707 DOI: 10.1093/bioinformatics/btx013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 01/11/2017] [Indexed: 11/14/2022] Open

Regular Simple Queues of Protein Contact Maps. Bull Math Biol 2016;79:21-35. [PMID: 27844300 DOI: 10.1007/s11538-016-0212-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 09/21/2016] [Indexed: 10/20/2022]

Simkovic F, Thomas JMH, Keegan RM, Winn MD, Mayans O, Rigden DJ. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds. IUCRJ 2016;3:259-70. [PMID: 27437113 PMCID: PMC4937781 DOI: 10.1107/s2052252516008113] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 05/18/2016] [Indexed: 05/05/2023]

Guo QH, Sun LH, Wang J. Enumeration of Extended m-Regular Linear Stacks. J Comput Biol 2016;23:943-956. [PMID: 27308919 DOI: 10.1089/cmb.2016.0041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Morlot JB, Mozziconacci J, Lesne A. Network concepts for analyzing 3D genome structure from chromosomal contact maps. ACTA ACUST UNITED AC 2016. [DOI: 10.1140/epjnbp/s40366-016-0029-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Kurczynska M, Kania E, Konopka BM, Kotulska M. Applying PyRosetta molecular energies to separate properly oriented protein models from mirror models, obtained from contact maps. J Mol Model 2016;22:111. [PMID: 27107578 PMCID: PMC4842210 DOI: 10.1007/s00894-016-2975-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 04/05/2016] [Indexed: 11/30/2022]

Kandathil SM, Handl J, Lovell SC. Toward a detailed understanding of search trajectories in fragment assembly approaches to protein structure prediction. Proteins 2016;84:411-26. [PMID: 26799916 PMCID: PMC4982100 DOI: 10.1002/prot.24987] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Revised: 12/03/2015] [Accepted: 12/31/2015] [Indexed: 11/30/2022]

Sawle L, Ghosh K. Convergence of Molecular Dynamics Simulation of Protein Native States: Feasibility vs Self-Consistency Dilemma. J Chem Theory Comput 2016;12:861-9. [DOI: 10.1021/acs.jctc.5b00999] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Zhang H, Huang Q, Bei Z, Wei Y, Floudas CA. COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming. Proteins 2016;84:332-48. [PMID: 26756402 DOI: 10.1002/prot.24979] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/19/2015] [Accepted: 12/10/2015] [Indexed: 12/28/2022]

Sneha P, Doss CGP. Molecular Dynamics: New Frontier in Personalized Medicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2015;102:181-224. [PMID: 26827606 DOI: 10.1016/bs.apcsb.2015.09.004] [Citation(s) in RCA: 109] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Abstract

The field of drug discovery has witnessed infinite development over the last decade with the demand for discovery of novel efficient lead compounds. Although the development of novel compounds in this field has seen large failure, a breakthrough in this area might be the establishment of personalized medicine. The trend of personalized medicine has shown stupendous growth being a hot topic after the successful completion of Human Genome Project and 1000 genomes pilot project. Genomic variant such as SNPs play a vital role with respect to inter individual's disease susceptibility and drug response. Hence, identification of such genetic variants has to be performed before administration of a drug. This process requires high-end techniques to understand the complexity of the molecules which might bring an insight to understand the compounds at their molecular level. To sustenance this, field of bioinformatics plays a crucial role in revealing the molecular mechanism of the mutation and thereby designing a drug for an individual in fast and affordable manner. High-end computational methods, such as molecular dynamics (MD) simulation has proved to be a constitutive approach to detecting the minor changes associated with an SNP for better understanding of the structural and functional relationship. The parameters used in molecular dynamic simulation elucidate different properties of a macromolecule, such as protein stability and flexibility. MD along with docking analysis can reveal the synergetic effect of an SNP in protein-ligand interaction and provides a foundation for designing a particular drug molecule for an individual. This compelling application of computational power and the advent of other technologies have paved a promising way toward personalized medicine. In this in-depth review, we tried to highlight the different wings of MD toward personalized medicine.

Collapse

Nowotny J, Ahmed S, Xu L, Oluwadare O, Chen H, Hensley N, Trieu T, Cao R, Cheng J. Iterative reconstruction of three-dimensional models of human chromosomes from chromosomal contact data. BMC Bioinformatics 2015;16:338. [PMID: 26493399 PMCID: PMC4619219 DOI: 10.1186/s12859-015-0772-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 10/13/2015] [Indexed: 11/10/2022] Open

Abstract

Background

The entire collection of genetic information resides within the chromosomes, which themselves reside within almost every cell nucleus of eukaryotic organisms. Each individual chromosome is found to have its own preferred three-dimensional (3D) structure independent of the other chromosomes. The structure of each chromosome plays vital roles in controlling certain genome operations, including gene interaction and gene regulation. As a result, knowing the structure of chromosomes assists in the understanding of how the genome functions. Fortunately, the 3D structure of chromosomes proves possible to construct through computational methods via contact data recorded from the chromosome. We developed a unique computational approach based on optimization procedures known as adaptation, simulated annealing, and genetic algorithm to construct 3D models of human chromosomes, using chromosomal contact data.

Results

Our models were evaluated using a percentage-based scoring function. Analysis of the scores of the final 3D models demonstrated their effective construction from our computational approach. Specifically, the models resulting from our approach yielded an average score of 80.41 %, with a high of 91 %, across models for all chromosomes of a normal human B-cell. Comparisons made with other methods affirmed the effectiveness of our strategy. Particularly, juxtaposition with models generated through the publicly available method Markov chain Monte Carlo 5C (MCMC5C) illustrated the outperformance of our approach, as seen through a higher average score for all chromosomes. Our methodology was further validated using two consistency checking techniques known as convergence testing and robustness checking, which both proved successful.

Conclusions

The pursuit of constructing accurate 3D chromosomal structures is fueled by the benefits revealed by the findings as well as any possible future areas of study that arise. This motivation has led to the development of our computational methodology. The implementation of our approach proved effective in constructing 3D chromosome models and proved consistent with, and more effective than, some other methods thereby achieving our goal of creating a tool to help advance certain research efforts. The source code, test data, test results, and documentation of our method, Gen3D, are available at our sourceforge site at: http://sourceforge.net/projects/gen3d/.

Collapse

Dabrowski-Tumanski P, Jarmolinska AI, Sulkowska JI. Prediction of the optimal set of contacts to fold the smallest knotted protein. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2015;27:354109. [PMID: 26291339 DOI: 10.1088/0953-8984/27/35/354109] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Adhikari B, Bhattacharya D, Cao R, Cheng J. CONFOLD: Residue-residue contact-guided ab initio protein folding. Proteins 2015;83:1436-49. [PMID: 25974172 PMCID: PMC4509844 DOI: 10.1002/prot.24829] [Citation(s) in RCA: 98] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 04/11/2015] [Accepted: 05/02/2015] [Indexed: 12/20/2022]

Pietal MJ, Bujnicki JM, Kozlowski LP. GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function. Bioinformatics 2015;31:3499-505. [PMID: 26130575 DOI: 10.1093/bioinformatics/btv390] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 06/23/2015] [Indexed: 11/13/2022] Open