1
|
Lee JW, Won JH, Jeon S, Choo Y, Yeon Y, Oh JS, Kim M, Kim S, Joung I, Jang C, Lee SJ, Kim TH, Jin KH, Song G, Kim ES, Yoo J, Paek E, Noh YK, Joo K. DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics 2023; 39:btad712. [PMID: 37995286 PMCID: PMC10699847 DOI: 10.1093/bioinformatics/btad712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures. RESULTS Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups. AVAILABILITY AND IMPLEMENTATION DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.
Collapse
Affiliation(s)
- Jae-Won Lee
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jong-Hyun Won
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Seonggwang Jeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Yujin Choo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Yubin Yeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jin-Seon Oh
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Minsoo Kim
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - SeonHwa Kim
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | | | - Cheongjae Jang
- Artificial Intelligence Institute, Hanyang University, Seoul 04763, Korea
| | - Sung Jong Lee
- Basic Science Research Institute, Changwon National University, Changwon 51140, Korea
| | - Tae Hyun Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, Korea
| | - Eun-Sol Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Jejoong Yoo
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Yung-Kyun Noh
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
2
|
Rajapaksa S, Sumanaweera D, Lesk AM, Allison L, Stuckey PJ, Garcia de la Banda M, Abramson D, Konagurthu AS. OUP accepted manuscript. Bioinformatics 2022; 38:i255-i263. [PMID: 35758808 PMCID: PMC9235515 DOI: 10.1093/bioinformatics/btac247] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/09/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sandun Rajapaksa
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | | | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lloyd Allison
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Peter J Stuckey
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Maria Garcia de la Banda
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - David Abramson
- Research Computing Center, University of Queensland, St Lucia, QLD 4067, Australia
| | | |
Collapse
|
3
|
Makarova KS, Wolf YI, Shmakov SA, Liu Y, Li M, Koonin EV. Unprecedented Diversity of Unique CRISPR-Cas-Related Systems and Cas1 Homologs in Asgard Archaea. CRISPR J 2021; 3:156-163. [PMID: 33555973 DOI: 10.1089/crispr.2020.0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The principal function of archaeal and bacterial CRISPR-Cas systems is antivirus adaptive immunity. However, recent genome analyses identified a variety of derived CRISPR-Cas variants at least some of which appear to perform different functions. Here, we describe a unique repertoire of CRISPR-Cas-related systems that we discovered by searching archaeal metagenome-assemble genomes of the Asgard superphylum. Several of these variants contain extremely diverged homologs of Cas1, the integrase involved in CRISPR adaptation as well as casposon transposition. Strikingly, the diversity of Cas1 in Asgard archaea alone is greater than that detected so far among the rest of archaea and bacteria. The Asgard CRISPR-Cas derivatives also encode distinct forms of Cas4, Cas5, and Cas7 proteins, and/or additional nucleases. Some of these systems are predicted to perform defense functions, but possibly not programmable ones, whereas others are likely to represent previously unknown mobile genetic elements.
Collapse
Affiliation(s)
- Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Sergey A Shmakov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Yang Liu
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P.R. China
| | - Meng Li
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P.R. China
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
4
|
Runthala A, Chowdhury S. Refined template selection and combination algorithm significantly improves template-based modeling accuracy. J Bioinform Comput Biol 2020; 17:1950006. [PMID: 31057073 DOI: 10.1142/s0219720019500069] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In contrast to ab-initio protein modeling methodologies, comparative modeling is considered as the most popular and reliable algorithm to model protein structure. However, the selection of the best set of templates is still a major challenge. An effective template-ranking algorithm is developed to efficiently select only the reliable hits for predicting the protein structures. The algorithm employs the pairwise as well as multiple sequence alignments of template hits to rank and select the best possible set of templates. It captures several key sequences and structural information of template hits and converts into scores to effectively rank them. This selected set of templates is used to model a target. Modeling accuracy of the algorithm is tested and evaluated on TBM-HA domain containing CASP8, CASP9 and CASP10 targets. On an average, this template ranking and selection algorithm improves GDT-TS, GDT-HA and TM_Score by 3.531, 4.814 and 0.022, respectively. Further, it has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model. The optimal model sampling also holds the key to predict the best possible target structure.
Collapse
Affiliation(s)
- Ashish Runthala
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| | - Shibasish Chowdhury
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| |
Collapse
|
5
|
Franzoi M, Sturlese M, Bellanda M, Mammi S. A molecular dynamics strategy for CSαβ peptides disulfide-assisted model refinement. J Biomol Struct Dyn 2017; 35:2736-2744. [PMID: 27581488 DOI: 10.1080/07391102.2016.1231081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Many cysteine-stabilized antimicrobial peptides from a variety of living organisms could be good candidates for the development of anti-infective agents. In the absence of experimentally obtained structural data, peptide modeling is an essential tool for understanding structure-activity relationships and for optimizing the bioactive moieties. Focusing on cysteine-rich peptide structures, we reproduced the case of structure predictions in the so-called midnight zone. We developed our protocol on a training set derived by clustering the available cysteine-stabilized αβ (CSαβ) structures in nine different representative families and tested it on peptides randomly selected from each family. Starting from draft models, we tested a structure-based disulfide predictor and we used cysteine distances as constraints during molecular dynamics. Finally, we proposed an analysis for final structure selection. Accordingly, we obtained a mean root mean square deviation improvement of 21% for the test set. Our findings demonstrate that it is possible to predict the network of disulfide bridges in cysteine-stabilized peptides and to use this result to improve the accuracy of structural predictions. Finally, we applied the methods to predict the structure of royalisin, a cysteine-rich peptide with unknown structure.
Collapse
Affiliation(s)
- Marco Franzoi
- a Department of Biology , University of Padova , Via Ugo Bassi 58/B, Padova 35131 , Italy
| | - Mattia Sturlese
- b Molecular Modeling Section (MMS), Department of Pharmaceutical and Pharmacological Sciences , University of Padova , Via Marzolo 5, Padova 35131 , Italy
| | - Massimo Bellanda
- c Department of Chemical Sciences , University of Padova , Via Marzolo 1, Padova 35131 , Italy
| | - Stefano Mammi
- c Department of Chemical Sciences , University of Padova , Via Marzolo 1, Padova 35131 , Italy
| |
Collapse
|
6
|
Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
7
|
Kuznetsov IB, McDuffie M. PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids. BMC Res Notes 2015; 8:187. [PMID: 25947299 PMCID: PMC4477417 DOI: 10.1186/s13104-015-1152-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 04/24/2015] [Indexed: 12/04/2022] Open
Abstract
Background Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities. Findings PR2ALIGN is a stand-alone software program and a web-server that provide the functionality for implementing flexible user-specified alignment scoring functions and aligning pairs of amino acid sequences based on the comparison of the profiles of biochemical properties of these sequences. Unlike the conventional sequence alignment methods that use 20x20 fixed amino acid substitution matrices, PR2ALIGN uses a set of weighted biochemical properties of amino acids to measure the distance between pairs of aligned residues and to find an optimal minimal distance global alignment. The user can provide any number of amino acid properties and specify a weight for each property. The higher the weight for a given property, the more this property affects the final alignment. We show that in many cases the approach implemented in PR2ALIGN produces better quality pair-wise alignments than the conventional matrix-based approach. Conclusions PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix. To the best of the authors’ knowledge, there are no existing stand-alone software programs or web-servers analogous to PR2ALIGN. The software is freely available from http://pr2align.rit.albany.edu. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1152-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Igor B Kuznetsov
- Cancer Research Center and Department of Epidemiology and Biostatistics, University at Albany, State University of New York, One Discovery Drive, Rensselaer, NY, 12144, USA.
| | - Michael McDuffie
- Cancer Research Center and Department of Epidemiology and Biostatistics, University at Albany, State University of New York, One Discovery Drive, Rensselaer, NY, 12144, USA.
| |
Collapse
|