1
|
Rahimzadeh F, Mohammad Khanli L, Salehpoor P, Golabi F, PourBahrami S. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis. Comput Biol Med 2024; 179:108815. [PMID: 38986287 DOI: 10.1016/j.compbiomed.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/09/2024] [Accepted: 06/24/2024] [Indexed: 07/12/2024]
Abstract
Predicting protein structure is both fascinating and formidable, playing a crucial role in structure-based drug discovery and unraveling diseases with elusive origins. The Critical Assessment of Protein Structure Prediction (CASP) serves as a biannual battleground where global scientists converge to untangle the intricate relationships within amino acid chains. Two primary methods, Template-Based Modeling (TBM) and Template-Free (TF) strategies, dominate protein structure prediction. The trend has shifted towards Template-Free predictions due to their broader sequence coverage with fewer templates. The predictive process can be broadly classified into contact map, binned-distance, and real-valued distance predictions, each with distinctive strengths and limitations manifested through tailored loss functions. We have also introduced revolutionary end-to-end, and all-atom diffusion-based techniques that have transformed protein structure predictions. Recent advancements in deep learning techniques have significantly improved prediction accuracy, although the effectiveness is contingent upon the quality of input features derived from natural bio-physiochemical attributes and Multiple Sequence Alignments (MSA). Hence, the generation of high-quality MSA data holds paramount importance in harnessing informative input features for enhanced prediction outcomes. Remarkable successes have been achieved in protein structure prediction accuracy, however not enough for what structural knowledge was intended to, which implies need for development in some other aspects of the predictions. In this regard, scientists have opened other frontiers for protein structural prediction. The utilization of subsampling in multiple sequence alignment (MSA) and protein language modeling appears to be particularly promising in enhancing the accuracy and efficiency of predictions, ultimately aiding in drug discovery efforts. The exploration of predicting protein complex structure also opens up exciting opportunities to deepen our knowledge of molecular interactions and design therapeutics that are more effective. In this article, we have discussed the vicissitudes that the scientists have gone through to improve prediction accuracy, and examined the effective policies in predicting from different aspects, including the construction of high quality MSA, providing informative input features, and progresses in deep learning approaches. We have also briefly touched upon transitioning from predicting single-chain protein structures to predicting protein complex structures. Our findings point towards promoting open research environments to support the objectives of protein structure prediction.
Collapse
Affiliation(s)
- Faezeh Rahimzadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | | | - Pedram Salehpoor
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Faegheh Golabi
- Department of Biomedical Engineering, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Shahin PourBahrami
- Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran
| |
Collapse
|
2
|
Saharkhiz S, Mostafavi M, Birashk A, Karimian S, Khalilollah S, Jaferian S, Yazdani Y, Alipourfard I, Huh YS, Farani MR, Akhavan-Sigari R. The State-of-the-Art Overview to Application of Deep Learning in Accurate Protein Design and Structure Prediction. Top Curr Chem (Cham) 2024; 382:23. [PMID: 38965117 PMCID: PMC11224075 DOI: 10.1007/s41061-024-00469-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 06/09/2024] [Indexed: 07/06/2024]
Abstract
In recent years, there has been a notable increase in the scientific community's interest in rational protein design. The prospect of designing an amino acid sequence that can reliably fold into a desired three-dimensional structure and exhibit the intended function is captivating. However, a major challenge in this endeavor lies in accurately predicting the resulting protein structure. The exponential growth of protein databases has fueled the advancement of the field, while newly developed algorithms have pushed the boundaries of what was previously achievable in structure prediction. In particular, using deep learning methods instead of brute force approaches has emerged as a faster and more accurate strategy. These deep-learning techniques leverage the vast amount of data available in protein databases to extract meaningful patterns and predict protein structures with improved precision. In this article, we explore the recent developments in the field of protein structure prediction. We delve into the newly developed methods that leverage deep learning approaches, highlighting their significance and potential for advancing our understanding of protein design.
Collapse
Affiliation(s)
- Saber Saharkhiz
- Division of Neuroscience, Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Mehrnaz Mostafavi
- Faculty of Allied Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Birashk
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA
| | - Shiva Karimian
- Electrical and Computer Research Center, Sanandaj Azad University, Sanandaj, Iran
| | - Shayan Khalilollah
- Department of Neurosurgery, Faculty of Medicine, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Sohrab Jaferian
- Goergen Institute for Data Science, University of Rochester, Rochester, NY, USA
| | - Yalda Yazdani
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Iraj Alipourfard
- Institute of Physical Chemistry, Polish Academy of Sciences, Marcina Kasprzaka 44/52, 01-224, Warsaw, Poland.
| | - Yun Suk Huh
- Department of Biological Engineering, Inha University, Incheon, Republic of Korea
| | | | | |
Collapse
|
3
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
4
|
Lin B, Luo X, Liu Y, Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform 2024; 25:bbae289. [PMID: 39003530 PMCID: PMC11246557 DOI: 10.1093/bib/bbae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/18/2024] [Indexed: 07/15/2024] Open
Abstract
Protein function prediction is critical for understanding the cellular physiological and biochemical processes, and it opens up new possibilities for advancements in fields such as disease research and drug discovery. During the past decades, with the exponential growth of protein sequence data, many computational methods for predicting protein function have been proposed. Therefore, a systematic review and comparison of these methods are necessary. In this study, we divide these methods into four different categories, including sequence-based methods, 3D structure-based methods, PPI network-based methods and hybrid information-based methods. Furthermore, their advantages and disadvantages are discussed, and then their performance is comprehensively evaluated and compared. Finally, we discuss the challenges and opportunities present in this field.
Collapse
Affiliation(s)
- Baohui Lin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| | - Xiaoling Luo
- Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Shenzhen, Guangdong, China
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518061, China
| | - Yumeng Liu
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| |
Collapse
|
5
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2024:10.1007/s12033-024-01119-4. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
6
|
Baker K, Hughes N, Bhattacharya S. An interactive visualization tool for educational outreach in protein contact map overlap analysis. FRONTIERS IN BIOINFORMATICS 2024; 4:1358550. [PMID: 38562910 PMCID: PMC10982686 DOI: 10.3389/fbinf.2024.1358550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Kevan Baker
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Nathaniel Hughes
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| | - Sutanu Bhattacharya
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| |
Collapse
|
7
|
Darden C, Donkor JE, Korolkova O, Barozai MYK, Chaudhuri M. Distinct structural motifs are necessary for targeting and import of Tim17 in Trypanosoma brucei mitochondrion. mSphere 2024; 9:e0055823. [PMID: 38193679 PMCID: PMC10871166 DOI: 10.1128/msphere.00558-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 11/28/2023] [Indexed: 01/10/2024] Open
Abstract
Nuclear-encoded mitochondrial proteins are correctly translocated to their proper sub-mitochondrial destination using location-specific mitochondrial targeting signals and via multi-protein import machineries (translocases) in the outer and inner mitochondrial membranes (TOM and TIMs, respectively). However, targeting signals of multi-pass Tims are less defined. Here, we report the characterization of the targeting signals of Trypanosoma brucei Tim17 (TbTim17), an essential component of the most divergent TIM complex. TbTim17 possesses a characteristic secondary structure including four predicted transmembrane (TM) domains in the center with hydrophilic N- and C-termini. After examining mitochondrial localization of various deletion and site-directed mutants of TbTim17 in T. brucei using subcellular fractionation and confocal microscopy, we located at least two internal targeting signals (ITS): (i) within TM1 (31-50 AAs) and (ii) TM4 + loop 3 (120-136 AAs). Both signals are required for proper targeting and integration of TbTim17 in the membrane. Furthermore, a positively charged residue (K122) is critical for mitochondrial localization of TbTim17. This is the first report of characterizing the ITS for a multipass inner membrane protein in a divergent eukaryote, like T. brucei.IMPORTANCEAfrican trypanosomiasis (AT) is a deadly disease in human and domestic animals, caused by the parasitic protozoan Trypanosoma brucei. Therefore, AT is not only a concern for human health but also for economic development in the vast area of sub-Saharan Africa. T. brucei possesses a single mitochondrion per cell that imports hundreds of nuclear-encoded mitochondrial proteins for its functions. T. brucei Tim17 (TbTim17), an essential component of the TbTIM17 complex, is a nuclear-encoded protein; thus, it is necessary to be imported from the cytosol to form the TbTIM17 complex. Here, we demonstrated that the internal targeting signals within the transmembrane 1 (TM1) and TM4 with loop 3, and residue K122 are required collectively for import and integration of TbTim17 in the T. brucei mitochondrion. This information could be utilized to block TbTim17 function and parasite growth.
Collapse
Affiliation(s)
- Chauncey Darden
- Department of Biochemistry, Cancer Biology, Neuroscience, and Pharmacology, Meharry Medical College, Nashville, Tennessee, USA
| | - Joseph E. Donkor
- Department of Microbiology, Immunology, and Physiology, Meharry Medical College, Nashville, Tennessee, USA
| | - Olga Korolkova
- The Consolidated Research Instrumentation, Informatics, Statistics, and Learning Integration Suite (CRISALIS), Meharry Medical College, Nashville, Tennessee, USA
| | | | - Minu Chaudhuri
- Department of Microbiology, Immunology, and Physiology, Meharry Medical College, Nashville, Tennessee, USA
| |
Collapse
|
8
|
Min J, Rong X, Zhang J, Su R, Wang Y, Qi W. Computational Design of Peptide Assemblies. J Chem Theory Comput 2024; 20:532-550. [PMID: 38206800 DOI: 10.1021/acs.jctc.3c01054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
With the ongoing development of peptide self-assembling materials, there is growing interest in exploring novel functional peptide sequences. From short peptides to long polypeptides, as the functionality increases, the sequence space is also expanding exponentially. Consequently, attempting to explore all functional sequences comprehensively through experience and experiments alone has become impractical. By utilizing computational methods, especially artificial intelligence enhanced molecular dynamics (MD) simulation and de novo peptide design, there has been a significant expansion in the exploration of sequence space. Through these methods, a variety of supramolecular functional materials, including fibers, two-dimensional arrays, nanocages, etc., have been designed by meticulously controlling the inter- and intramolecular interactions. In this review, we first provide a brief overview of the current main computational methods and then focus on the computational design methods for various self-assembled peptide materials. Additionally, we introduce some representative protein self-assemblies to offer guidance for the design of self-assembling peptides.
Collapse
Affiliation(s)
- Jiwei Min
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
| | - Xi Rong
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
| | - Jiaxing Zhang
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
| | - Rongxin Su
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, P. R. China
- Tianjin Key Laboratory of Membrane Science and Desalination Technology, Tianjin 300072, P. R. China
| | - Yuefei Wang
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
- Tianjin Key Laboratory of Membrane Science and Desalination Technology, Tianjin 300072, P. R. China
| | - Wei Qi
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, P. R. China
- Tianjin Key Laboratory of Membrane Science and Desalination Technology, Tianjin 300072, P. R. China
| |
Collapse
|
9
|
Peng CX, Liang F, Xia YH, Zhao KL, Hou MH, Zhang GJ. Recent Advances and Challenges in Protein Structure Prediction. J Chem Inf Model 2024; 64:76-95. [PMID: 38109487 DOI: 10.1021/acs.jcim.3c01324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Fang Liang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
10
|
Li J, Wang L, Zhu Z, Song C. Exploring the Alternative Conformation of a Known Protein Structure Based on Contact Map Prediction. J Chem Inf Model 2024; 64:301-315. [PMID: 38117138 PMCID: PMC10777399 DOI: 10.1021/acs.jcim.3c01381] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023]
Abstract
The rapid development of deep learning-based methods has considerably advanced the field of protein structure prediction. The accuracy of predicting the 3D structures of simple proteins is comparable to that of experimentally determined structures, providing broad possibilities for structure-based biological studies. Another critical question is whether and how multistate structures can be predicted from a given protein sequence. In this study, analysis of tens of two-state proteins demonstrated that deep learning-based contact map predictions contain structural information on both states, which suggests that it is probably appropriate to change the target of deep learning-based protein structure prediction from one specific structure to multiple likely structures. Furthermore, by combining deep learning- and physics-based computational methods, we developed a protocol for exploring alternative conformations from a known structure of a given protein, by which we successfully approached the holo-state conformations of multiple representative proteins from their apo-state structures.
Collapse
Affiliation(s)
- Jiaxuan Li
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Lei Wang
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Zefeng Zhu
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Chen Song
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
11
|
Zhang Y, Zhang Z, Kagaya Y, Terashi G, Zhao B, Xiong Y, Kihara D. Distance-AF: Modifying Predicted Protein Structure Models by Alphafold2 with User-Specified Distance Constraints. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.01.569498. [PMID: 38106200 PMCID: PMC10723377 DOI: 10.1101/2023.12.01.569498] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The three-dimensional structure of a protein plays a fundamental role in determining its function and has an essential impact on understanding biological processes. Despite significant progress in protein structure prediction, such as AlphaFold2, challenges remain on those hard targets that Alphafold2 does not often perform well due to the complex folding of protein and a large number of possible conformations. Here we present a modified version of the AlphaFold2, called Distance-AF, which aims to improve the performance of AlphaFold2 by including distance constraints as input information. Distance-AF uses AlphaFold2's predicted structure as a starting point and incorporates distance constraints between amino acids to adjust folding of the protein structure until it meets the constraints. Distance-AF can correct the domain orientation on challenging targets, leading to more accurate structures with a lower root mean square deviation (RMSD). The ability of Distance-AF is also useful in fitting protein structures into cryo-electron microscopy maps.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| |
Collapse
|
12
|
Zhu HT, Xia YH, Zhang GJ. E2EDA: Protein Domain Assembly Based on End-to-End Deep Learning. J Chem Inf Model 2023; 63:6451-6461. [PMID: 37788318 DOI: 10.1021/acs.jcim.3c01387] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
With the development of deep learning, almost all single-domain proteins can be predicted at experimental resolution. However, the structure prediction of multi-domain proteins remains a challenge. Achieving end-to-end protein domain assembly and further improving the accuracy of the full-chain modeling by accurately predicting inter-domain orientation while improving the assembly efficiency will provide significant insights into structure-based drug discovery. In this work, we propose an End-to-End Domain Assembly method based on deep learning, named E2EDA. We first develop RMNet, an EfficientNetV2-based deep learning model that fuses multiple features using an attention mechanism to predict inter-domain rigid motion. Then, the predicted rigid motions are transformed into inter-domain spatial transformations to directly assemble the full-chain model. Finally, the scoring strategy RMscore is designed to select the best model from multiple assembled models. The experimental results show that the average TM-score of the model assembled by E2EDA on the benchmark set (282) is 0.827, which is better than those of other domain assembly methods SADA (0.792) and DEMO (0.730). Meanwhile, on our constructed multi-domain data set from AlphaFold DB, the model reassembled by E2EDA is 7.0% higher in TM-score compared to the full-chain model predicted by AlphaFold2, indicating that E2EDA can capture more accurate inter-domain orientations to improve the quality of the model predicted by AlphaFold2. Furthermore, compared to SADA and AlphaFold2, E2EDA reduced the average runtime on the benchmark by 64.7% and 19.2%, respectively, indicating that E2EDA can significantly improve assembly efficiency through an end-to-end approach. The online server is available at http://zhanglab-bioinf.com/E2EDA.
Collapse
Affiliation(s)
- Hai-Tao Zhu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| |
Collapse
|
13
|
Sosnick TR. AlphaFold developers Demis Hassabis and John Jumper share the 2023 Albert Lasker Basic Medical Research Award. J Clin Invest 2023:e174915. [PMID: 37731359 DOI: 10.1172/jci174915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open
|
14
|
Kao HW, Lu WL, Ho MR, Lin YF, Hsieh YJ, Ko TP, Danny Hsu ST, Wu KP. Robust Design of Effective Allosteric Activators for Rsp5 E3 Ligase Using the Machine Learning Tool ProteinMPNN. ACS Synth Biol 2023; 12:2310-2319. [PMID: 37556858 DOI: 10.1021/acssynbio.3c00042] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
We used the deep learning tool ProteinMPNN to redesign ubiquitin (Ub) as a specific and functionally stimulating/enhancing binder of the Rsp5 E3 ligase. We generated 20 extensively mutated─up to 37 of 76 residues─recombinant Ub variants (UbVs), named R1 to R20, displaying well-folded structures and high thermal stabilities. These UbVs can also form stable complexes with Rsp5, as predicted using AlphaFold2. Three of the UbVs bound to Rsp5 with low micromolar affinity, with R4 and R12 effectively enhancing the Rsp5 activity six folds. AlphaFold2 predicts that R4 and R12 bind to Rsp5's exosite in an identical manner to the Rsp5-Ub template, thereby allosterically activating Rsp5-Ub thioester formation. Thus, we present a virtual solution for rapidly and cost-effectively designing UbVs as functional modulators of Ub-related enzymes.
Collapse
Affiliation(s)
- Hsi-Wen Kao
- Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan
| | - Wei-Lin Lu
- Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan
| | - Meng-Ru Ho
- Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan
| | - Yu-Fong Lin
- Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan
- Institute of Biochemical Science, National Taiwan University, Taipei 106, Taiwan
| | - Yun-Jung Hsieh
- Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan
- Institute of Biochemical Science, National Taiwan University, Taipei 106, Taiwan
| | - Tzu-Ping Ko
- Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan
| | - Shang-Te Danny Hsu
- Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan
- Institute of Biochemical Science, National Taiwan University, Taipei 106, Taiwan
- International Institute for Sustainability with Knotted Chiral Meta Matter, Hiroshima University, Higashihiroshima 739-8527, Japan
| | - Kuen-Phon Wu
- Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan
- Institute of Biochemical Science, National Taiwan University, Taipei 106, Taiwan
| |
Collapse
|
15
|
Darden C, Donkor J, Korolkova O, Khan Barozai MY, Chaudhuri M. Distinct structural motifs are necessary for targeting and import of Tim17 in Trypanosoma brucei mitochondrion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.07.548172. [PMID: 37461662 PMCID: PMC10350046 DOI: 10.1101/2023.07.07.548172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Nuclear-encoded mitochondrial proteins are correctly translocated to their proper sub-mitochondrial destination using location specific mitochondrial targeting signals (MTSs) and via multi-protein import machineries (translocases) in the outer and inner mitochondrial membranes (TOM and TIMs, respectively). However, MTSs of multi-pass Tims are less defined. Here we report the characterization of the MTSs of Trypanosoma brucei Tim17 (TbTim17), an essential component of the most divergent TIM complex. TbTim17 possesses a characteristic secondary structure including four predicted transmembrane (TM) domains in the center with hydrophilic N- and C-termini. After examining mitochondrial localization of various deletion and site-directed mutants of TbTim17 in T. brucei using subcellular fractionation and confocal microscopy we located at least two internal signals, 1) within TM1 (31-50 AAs) and 2) TM4 + Loop 3 (120-136 AAs). Both signals are required for proper targeting and integration of TbTim17 in the membrane. Furthermore, a positively charged residue (K 122 ) is critical for mitochondrial localization of TbTim17. This is the first report of characterizing the internal mitochondrial targeting signals (ITS) for a multipass inner membrane protein in a divergent eukaryote, like T. brucei . Summary Internal targeting signals within the TM1, TM4 with Loop 3, and residue K122 are required collectively for import and integration of TbTim17 in the T. brucei mitochondrion. This information could be utilized to block parasite growth.
Collapse
|
16
|
Meng Q, Guo F, Tang J. Improved structure-related prediction for insufficient homologous proteins using MSA enhancement and pre-trained language model. Brief Bioinform 2023:bbad217. [PMID: 37321965 DOI: 10.1093/bib/bbad217] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 04/18/2023] [Accepted: 05/21/2023] [Indexed: 06/17/2023] Open
Abstract
In recent years, protein structure problems have become a hotspot for understanding protein folding and function mechanisms. It has been observed that most of the protein structure works rely on and benefit from co-evolutionary information obtained by multiple sequence alignment (MSA). As an example, AlphaFold2 (AF2) is a typical MSA-based protein structure tool which is famous for its high accuracy. As a consequence, these MSA-based methods are limited by the quality of the MSAs. Especially for orphan proteins that have no homologous sequence, AlphaFold2 performs unsatisfactorily as MSA depth decreases, which may pose a barrier to its widespread application in protein mutation and design problems in which there are no rich homologous sequences and rapid prediction is needed. In this paper, we constructed two standard datasets for orphan and de novo proteins which have insufficient/none homology information, called Orphan62 and Design204, respectively, to fairly evaluate the performance of the various methods in this case. Then, depending on whether or not utilizing scarce MSA information, we summarized two approaches, MSA-enhanced and MSA-free methods, to effectively solve the issue without sufficient MSAs. MSA-enhanced model aims to improve poor MSA quality from the data source by knowledge distillation and generation models. MSA-free model directly learns the relationship between residues on enormous protein sequences from pre-trained models, bypassing the step of extracting the residue pair representation from MSA. Next, we evaluated the performance of four MSA-free methods (trRosettaX-Single, TRFold, ESMFold and ProtT5) and MSA-enhanced (Bagging MSA) method compared with a traditional MSA-based method AlphaFold2, in two protein structure-related prediction tasks, respectively. Comparison analyses show that trRosettaX-Single and ESMFold which belong to MSA-free method can achieve fast prediction ($\sim\! 40$s) and comparable performance compared with AF2 in tertiary structure prediction, especially for short peptides, $\alpha $-helical segments and targets with few homologous sequences. Bagging MSA utilizing MSA enhancement improves the accuracy of our trained base model which is an MSA-based method when poor homology information exists in secondary structure prediction. Our study provides biologists an insight of how to select rapid and appropriate prediction tools for enzyme engineering and peptide drug development. CONTACT guofei@csu.edu.cn, jj.tang@siat.ac.cn.
Collapse
Affiliation(s)
- Qiaozhen Meng
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518000, China
| |
Collapse
|
17
|
Comparative genomics and interactomics of polyadenylation factors for the prediction of new parasite targets: Entamoeba histolytica as a working model. Biosci Rep 2023; 43:232462. [PMID: 36651565 PMCID: PMC9912109 DOI: 10.1042/bsr20221911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 01/05/2023] [Accepted: 01/13/2023] [Indexed: 01/19/2023] Open
Abstract
Protein-protein interactions (PPI) play a key role in predicting the function of a target protein and drug ability to affect an entire biological system. Prediction of PPI networks greatly contributes to determine a target protein and signal pathways related to its function. Polyadenylation of mRNA 3'-end is essential for gene expression regulation and several polyadenylation factors have been shown as valuable targets for controlling protozoan parasites that affect human health. Here, by using a computational strategy based on sequence-based prediction approaches, phylogenetic analyses, and computational prediction of PPI networks, we compared interactomes of polyadenylation factors in relevant protozoan parasites and the human host, to identify key proteins and define potential targets for pathogen control. Then, we used Entamoeba histolytica as a working model to validate our computational results. RT-qPCR assays confirmed the coordinated modulation of connected proteins in the PPI network and evidenced that silencing of the bottleneck protein EhCFIm25 affects the expression of interacting proteins. In addition, molecular dynamics simulations and docking approaches allowed to characterize the relationships between EhCFIm25 and Ehnopp34, two connected bottleneck proteins. Interestingly, the experimental identification of EhCFIm25 interactome confirmed the close relationships among proteins involved in gene expression regulation and evidenced new links with moonlight proteins in E. histolytica, suggesting a connection between RNA biology and metabolism as described in other organisms. Altogether, our results strengthened the relevance of comparative genomics and interactomics of polyadenylation factors for the prediction of new targets for the control of these human pathogens.
Collapse
|
18
|
Liu J, Tang X, Guan X. Grain protein function prediction based on self-attention mechanism and bidirectional LSTM. Brief Bioinform 2023; 24:6886418. [PMID: 36567619 DOI: 10.1093/bib/bbac493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 10/13/2022] [Accepted: 10/18/2022] [Indexed: 12/27/2022] Open
Abstract
With the development of genome sequencing technology, using computing technology to predict grain protein function has become one of the important tasks of bioinformatics. The protein data of four grains, soybean, maize, indica and japonica are selected in this experimental dataset. In this paper, a novel neural network algorithm Chemical-SA-BiLSTM is proposed for grain protein function prediction. The Chemical-SA-BiLSTM algorithm fuses the chemical properties of proteins on the basis of amino acid sequences, and combines the self-attention mechanism with the bidirectional Long Short-Term Memory network. The experimental results show that the Chemical-SA-BiLSTM algorithm is superior to other classical neural network algorithms, and can more accurately predict the protein function, which proves the effectiveness of the Chemical-SA-BiLSTM algorithm in the prediction of grain protein function. The source code of our method is available at https://github.com/HwaTong/Chemical-SA-BiLSTM.
Collapse
Affiliation(s)
- Jing Liu
- College of Information Engineering, Shanghai Maritime University, 201306, Shanghai, China
| | - Xinghua Tang
- College of Information Engineering, Shanghai Maritime University, 201306, Shanghai, China
| | - Xiao Guan
- School of Health Science and Engineering, University of Shanghai for Science and Technology, 200093, Shanghai, China
| |
Collapse
|
19
|
Newman KE, Tindall SN, Mader SL, Khalid S, Thomas GH, Van Der Woude MW. A novel fold for acyltransferase-3 (AT3) proteins provides a framework for transmembrane acyl-group transfer. eLife 2023; 12:e81547. [PMID: 36630168 PMCID: PMC9833829 DOI: 10.7554/elife.81547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 12/04/2022] [Indexed: 01/12/2023] Open
Abstract
Acylation of diverse carbohydrates occurs across all domains of life and can be catalysed by proteins with a membrane bound acyltransferase-3 (AT3) domain (PF01757). In bacteria, these proteins are essential in processes including symbiosis, resistance to viruses and antimicrobials, and biosynthesis of antibiotics, yet their structure and mechanism are largely unknown. In this study, evolutionary co-variance analysis was used to build a computational model of the structure of a bacterial O-antigen modifying acetyltransferase, OafB. The resulting structure exhibited a novel fold for the AT3 domain, which molecular dynamics simulations demonstrated is stable in the membrane. The AT3 domain contains 10 transmembrane helices arranged to form a large cytoplasmic cavity lined by residues known to be essential for function. Further molecular dynamics simulations support a model where the acyl-coA donor spans the membrane through accessing a pore created by movement of an important loop capping the inner cavity, enabling OafB to present the acetyl group close to the likely catalytic resides on the extracytoplasmic surface. Limited but important interactions with the fused SGNH domain in OafB are identified, and modelling suggests this domain is mobile and can both accept acyl-groups from the AT3 and then reach beyond the membrane to reach acceptor substrates. Together this new general model of AT3 function provides a framework for the development of inhibitors that could abrogate critical functions of bacterial pathogens.
Collapse
Affiliation(s)
- Kahlan E Newman
- School of Chemistry, University of SouthamptonSouthamptonUnited Kingdom
| | - Sarah N Tindall
- Department of Biology and the York Biomedical Research Institute, University of YorkYorkUnited Kingdom
| | - Sophie L Mader
- Department of Biochemistry, University of OxfordOxfordUnited Kingdom
| | - Syma Khalid
- Department of Biochemistry, University of OxfordOxfordUnited Kingdom
| | - Gavin H Thomas
- Department of Biology and the York Biomedical Research Institute, University of YorkYorkUnited Kingdom
| | - Marjan W Van Der Woude
- Hull York Medical School and the York Biomedical Research Institute, University of YorkYorkUnited Kingdom
| |
Collapse
|
20
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
21
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
22
|
Miah MM, Tabassum N, Afroj Zinnia M, Islam ABMMK. Drug and Anti-Viral Peptide Design to Inhibit the Monkeypox Virus by Restricting A36R Protein. Bioinform Biol Insights 2022; 16:11779322221141164. [PMID: 36570327 PMCID: PMC9772960 DOI: 10.1177/11779322221141164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 11/06/2022] [Indexed: 12/24/2022] Open
Abstract
Most recently, monkeypox virus (MPXV) has emanated as a global public health threat. Unavailability of effective medicament against MPXV escalates demand for new therapeutic agent. In this study, in silico strategies were conducted to identify novel drug against the A36R protein of MPXV. The A36R protein of MPXV is responsible for the viral migration, adhesion, and vesicle trafficking to the host cell. To block the A36R protein, 4893 potential antiviral peptides (AVPs) were retrieved from DRAMP and SATPdb databases. Finally, 57 sequences were screened based on peptide filtering criteria, which were then modeled. Likewise, 31 monkeypox virus A36R protein sequences were collected from NCBI protein database to find consensus sequence and to predict 3D protein model. The refined and validated models of the A36R protein and AVP peptides were used to predict receptor-ligand interactions using DINC 2 server. Three peptides that showed best interactions were SATPdb10193, SATPdb21850, and SATPdb26811 with binding energies -6.10, -6.10, and -6.30 kcal/mol, respectively. Small molecules from drug databases were also used to perform virtual screening against the A36R protein. Among databases, Enamine-HTSC showed strong affinity with docking scores ranging from -8.8 to 9.8 kcal/mol. Interaction of target protein A36R with the top 3 peptides and the most probable drug (Z55287118) examined by molecular dynamic (MD) simulation. Trajectory analyses (RMSD, RMSF, SASA, and Rg) confirmed the stable nature of protein-ligand and protein-peptide complexes. This work suggests that identified top AVPs and small molecules might interfere with the function of the A36R protein of MPXV.
Collapse
Affiliation(s)
| | - Nuzhat Tabassum
- Department of Pharmacy, East West University, Dhaka, Bangladesh
| | | | - Abul Bashar Mir Md. Khademul Islam
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Dhaka, Bangladesh,Abul Bashar Mir Md. Khademul Islam, Department of Genetic Engineering and Biotechnology, University of Dhaka, Nilkhet Rd, Dhaka 1000, Bangladesh.
| |
Collapse
|
23
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
24
|
Wang F, Zhan Y, Li M, Wang L, Zheng A, Liu C, Wang H, Wang T. Cell-Permeable PROTAC Degraders against KEAP1 Efficiently Suppress Hepatic Stellate Cell Activation through the Antioxidant and Anti-Inflammatory Pathway. ACS Pharmacol Transl Sci 2022; 6:76-87. [PMID: 36654751 PMCID: PMC9841780 DOI: 10.1021/acsptsci.2c00165] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Indexed: 12/12/2022]
Abstract
Accumulating evidence indicates that oxidative stress and inflammation are involved in the physiopathology of liver fibrogenesis. Nuclear factor erythroid 2-related factor 2 (Nrf2) is a key transcription factor, which regulates the expression of redox regulators to establish cellular redox homeostasis. The Nrf2 modulator can serve as a primary cellular defense against the cytotoxic effects of oxidative stress. We designed a chimeric Keap1-Keap1 peptide (KKP1) based on the proteolysis-targeting chimera technology. The KKP1 peptide not only can efficiently penetrate into the rat hepatic stellate cell line (HSC-T6) cells but also can induce Keap1 protein degradation by the ubiquitination-proteasome degradation pathway, which releases Nrf2 and promotes the transcriptional activity of the Nrf2/antioxidant response element pathway. It then activates the protein expression of the downstream antioxidant factors, the glutamate-cysteine ligase catalytic subunit and heme oxygenase-1 (HO-1). Finally, Keap1 protein degradation inhibits the nuclear factor-kappaB inflammatory signal pathway, the downstream inflammatory factor tumor necrosis factor alpha, and the interleukin-1beta protein expression and further inhibits the expression of the fibrosis biomarker gene. The current research suggests that our designed KKP1 may provide a new avenue for the future treatment of liver fibrosis.
Collapse
Affiliation(s)
- Fengqin Wang
- Hubei
Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443002, China
| | - Ying Zhan
- Hubei
Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443002, China
| | - Manman Li
- Hubei
Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443002, China
| | - Lidan Wang
- Hubei
Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443002, China,Department
of Microbiology and Immunology, Medical School, China Three Gorges University, Yichang 443002, China
| | - Austin Zheng
- Institute
of Cell Engineering, School of Medicine, Johns Hopkins University, Baltimore, Maryland 21215, United States
| | - Changbai Liu
- Hubei
Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443002, China
| | - Hu Wang
- Hubei
Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang 443002, China,Institute
of Cell Engineering, School of Medicine, Johns Hopkins University, Baltimore, Maryland 21215, United States,
| | - Tao Wang
- The
First College of Clinical Medical Sciences, China Three Gorges University, Yichang, Hubei 443003, China,
| |
Collapse
|
25
|
Katase N, Nishimatsu SI, Yamauchi A, Okano S, Fujita S. Establishment of anti-DKK3 peptide for the cancer control in head and neck squamous cell carcinoma (HNSCC). Cancer Cell Int 2022; 22:352. [PMID: 36376957 PMCID: PMC9664703 DOI: 10.1186/s12935-022-02783-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 11/04/2022] [Indexed: 11/16/2022] Open
Abstract
Background Head and neck squamous cell carcinoma (HNSCC) is the most common malignant tumor of the head and neck. We identified cancer-specific genes in HNSCC and focused on DKK3 expression. DKK3 gene codes two isoforms of proteins (secreted and non-secreted) with two distinct cysteine rich domains (CRDs). It is reported that DKK3 functions as a negative regulator of oncogenic Wnt signaling and, is therefore, considered to be a tumor suppressor gene. However, our series of studies have demonstrated that DKK3 expression is specifically high in HNSCC tissues and cells, and that DKK3 might determine the malignant potentials of HNSCC cells via the activation of Akt. Further analyses strongly suggested that both secreted DKK3 and non-secreted DKK3 could activate Akt signaling in discrete ways, and consequently exert tumor promoting effects. We hypothesized that DKK3 might be a specific druggable target, and it is necessary to establish a DKK3 inhibitor that can inhibit both secreted and non-secreted isoforms of DKK3. Methods Using inverse polymerase chain reaction, we generated mutant expression plasmids that express DKK3 without CRD1, CRD2, or both CRD1 and CRD2 (DKK3ΔC1, DKK3ΔC2, and DKK3ΔC1ΔC2, respectively). These plasmids were then transfected into HNSCC-derived cells to determine the domain responsible for DKK3-mediated Akt activation. We designed antisense peptides using the MIMETEC program, targeting DKK3-specific amino acid sequences within CRD1 and CRD2. The structural models for peptides and DKK3 were generated using Raptor X, and then a docking simulation was performed using CluPro2. Afterward, the best set of the peptides was applied into HNSCC-derived cells, and the effects on Akt phosphorylation, cellular proliferation, invasion, and migration were assessed. We also investigated the therapeutic effects of the peptides in the xenograft models. Results Transfection of mutant expression plasmids and subsequent functional analyses revealed that it is necessary to delete both CRD1 and CRD2 to inhibit Akt activation and inhibition of proliferation, migration, and invasion. The inhibitory peptides for CRD1 and CRD2 of DKK3 significantly reduced the phosphorylation of Akt, and consequently suppressed cellular proliferation, migration, invasion and in vivo tumor growth at very low doses. Conclusions This inhibitory peptide represents a promising new therapeutic strategy for HNSCC treatment. Supplementary Information The online version contains supplementary material available at 10.1186/s12935-022-02783-9.
Collapse
|
26
|
Guo Z, Liu J, Skolnick J, Cheng J. Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks. Nat Commun 2022; 13:6963. [PMID: 36379943 PMCID: PMC9666547 DOI: 10.1038/s41467-022-34600-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 10/24/2022] [Indexed: 11/16/2022] Open
Abstract
Residue-residue distance information is useful for predicting tertiary structures of protein monomers or quaternary structures of protein complexes. Many deep learning methods have been developed to predict intra-chain residue-residue distances of monomers accurately, but few methods can accurately predict inter-chain residue-residue distances of complexes. We develop a deep learning method CDPred (i.e., Complex Distance Prediction) based on the 2D attention-powered residual network to address the gap. Tested on two homodimer datasets, CDPred achieves the precision of 60.94% and 42.93% for top L/5 inter-chain contact predictions (L: length of the monomer in homodimer), respectively, substantially higher than DeepHomo's 37.40% and 23.08% and GLINTER's 48.09% and 36.74%. Tested on the two heterodimer datasets, the top Ls/5 inter-chain contact prediction precision (Ls: length of the shorter monomer in heterodimer) of CDPred is 47.59% and 22.87% respectively, surpassing GLINTER's 23.24% and 13.49%. Moreover, the prediction of CDPred is complementary with that of AlphaFold2-multimer.
Collapse
Affiliation(s)
- Zhiye Guo
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jian Liu
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jeffrey Skolnick
- grid.213917.f0000 0001 2097 4943School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332-200 USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
27
|
Peng CX, Zhou XG, Xia YH, Liu J, Hou MH, Zhang GJ. Structural analogue-based protein structure domain assembly assisted by deep learning. Bioinformatics 2022; 38:4513-4521. [PMID: 35962986 DOI: 10.1093/bioinformatics/btac553] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 07/27/2022] [Accepted: 08/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
28
|
Potempa LA, Qiu WQ, Stefanski A, Rajab IM. Relevance of lipoproteins, membranes, and extracellular vesicles in understanding C-reactive protein biochemical structure and biological activities. Front Cardiovasc Med 2022; 9:979461. [PMID: 36158829 PMCID: PMC9493015 DOI: 10.3389/fcvm.2022.979461] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 07/29/2022] [Indexed: 11/13/2022] Open
Abstract
Early purification protocols for C-reactive protein (CRP) often involved co-isolation of lipoproteins, primarily very low-density lipoproteins (VLDLs). The interaction with lipid particles was initially attributed to CRP’s calcium-dependent binding affinity for its primary ligand—phosphocholine—the predominant hydrophilic head group expressed on phospholipids of most lipoprotein particles. Later, CRP was shown to additionally express binding affinity for apolipoprotein B (apo B), a predominant apolipoprotein of both VLDL and LDL particles. Apo B interaction with CRP was shown to be mediated by a cationic peptide sequence in apo B. Optimal apo B binding required CRP to be surface immobilized or aggregated, treatments now known to structurally change CRP from its serum soluble pentamer isoform (i.e., pCRP) into its poorly soluble, modified, monomeric isoform (i.e., mCRP). Other cationic ligands have been described for CRP which affect complement activation, histone bioactivities, and interactions with membranes. mCRP, but not pCRP, binds cholesterol and activates signaling pathways that activate pro-inflammatory bioactivities long associated with CRP as a biomarker. Hence, a key step to express CRP’s biofunctions is its conversion into its mCRP isoform. Conversion occurs when (1) pCRP binds to a membrane surface expressed ligand (often phosphocholine); (2) biochemical forces associated with binding cause relaxation/partial dissociation of secondary and tertiary structures into a swollen membrane bound intermediate (described as mCRPm or pCRP*); (3) further structural relaxation which leads to total, irreversible dissociation of the pentamer into mCRP and expression of a cholesterol/multi-ligand binding sequence that extends into the subunit core; (4) reduction of the CRP subunit intrachain disulfide bond which enhances CRP’s binding accessibility for various ligands and activates acute phase proinflammatory responses. Taken together, the biofunctions of CRP involve both lipid and protein interactions and a conformational rearrangement of higher order structure that affects its role as a mediator of inflammatory responses.
Collapse
Affiliation(s)
- Lawrence A. Potempa
- College of Science, Health and Pharmacy, Roosevelt University Schaumburg, Schaumburg, IL, United States
- *Correspondence: Lawrence A. Potempa,
| | - Wei Qiao Qiu
- Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine, Boston, MA, United States
- Alzheimer’s Disease Center, Boston University School of Medicine, Boston, MA, United States
- Department of Psychiatry, Boston University School of Medicine, Boston, MA, United States
| | - Ashley Stefanski
- College of Science, Health and Pharmacy, Roosevelt University Schaumburg, Schaumburg, IL, United States
| | - Ibraheem M. Rajab
- College of Science, Health and Pharmacy, Roosevelt University Schaumburg, Schaumburg, IL, United States
| |
Collapse
|
29
|
Mondal A, Paul D, Dastidar SG, Saha T, Goswami AM. In silico analyses of Wnt1 nsSNPs reveal structurally destabilizing variants, altered interactions with Frizzled receptors and its deregulation in tumorigenesis. Sci Rep 2022; 12:14934. [PMID: 36056132 PMCID: PMC9440047 DOI: 10.1038/s41598-022-19299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 08/26/2022] [Indexed: 11/26/2022] Open
Abstract
Wnt1 is the first mammalian Wnt gene, which is discovered as proto-oncogene and in human the gene is located on the chromosome 12q13. Mutations in Wnt1 are reported to be associated with various cancers and other human diseases. The structural and functional consequences of most of the non-synonymous SNPs (nsSNPs), present in the human Wnt1 gene, are not known. In the present work, extensive bioinformatics analyses are used to screen 292 nsSNPs of Wnt1 for predicting pathogenic and harmless polymorphisms. We have identified 10 highly deleterious nsSNPs among which 7 are located within the highly conserved areas. These 10 nsSNPs are also predicted to affect the post-translational modifications of Wnt1. Further, structure based stability analyses of these 10 highly deleterious nsSNPs revealed 8 variants as highly destabilizing. These 8 highly destabilizing variants were shown to have high BC score and high RMSIP score from normal mode analyses. Based on the deformation energies, obtained from the normal mode analyses, variants like G169A, G169S, G331R and G331S were found to be unstable. Molecular Dynamics (MD) simulations revealed structural stability and fluctuation of WT Wnt1 and its prioritized variants. RMSD remained fluctuating mostly between 4 and 5 Å and occasionally between 3.5 and 5.5 Å ranges. RMSF in the CTD region (residues 330-360) of the binding pocket were lower compared to that of WT. Studying the impacts of nsSNPs on the binding interface of Wnt1 and seven Frizzled receptors have predicted substitutions which can stabilize or destabilize the binding interface. We have found that Wnt1 and FZD8-CRD is the best docked complex in our study. MD simulation based analyses of wild type Wnt1-FZD8-CRD complex and the 8 prioritized variants revealed that RMSF was higher in the unstructured regions and RMSD remained fluctuating in the region of 5 Å ± 1 Å. We have also observed differential Wnt1 gene expression pattern in normal, tumor and metastatic conditions across different tissues. Wnt1 gene expression was significantly higher in metastatic tissues of lungs, colon and skin; and was significantly lower in metastatic tissues of breast, esophagus and kidney. We have also found that Wnt1 deregulation is associated with survival outcome in patients with gastric and breast cancer. Furthermore, these computationally screened highly deleterious nsSNPs of Wnt1 can be analyzed in population based genetic studies and may help understand the Wnt1 associated diseases.
Collapse
Affiliation(s)
- Amalesh Mondal
- Department of Physiology, Katwa College, Purba Bardhaman, Katwa, West Bengal, 713130, India
- Department of Molecular Biology and Biotechnology, University of Kalyani, Nadia, Kalyani, India
| | - Debarati Paul
- Division of Bioinformatics, Bose Institute, P-1/12 CIT Scheme VII M, Kolkata, 700054, India
| | - Shubhra Ghosh Dastidar
- Division of Bioinformatics, Bose Institute, P-1/12 CIT Scheme VII M, Kolkata, 700054, India
| | - Tanima Saha
- Department of Molecular Biology and Biotechnology, University of Kalyani, Nadia, Kalyani, India.
| | - Achintya Mohan Goswami
- Department of Physiology, Krishnagar Govt. College, Nadia, Krishnagar, West Bengal, 741101, India.
| |
Collapse
|
30
|
Mahmud S, Guo Z, Quadir F, Liu J, Cheng J. Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps. BMC Bioinformatics 2022; 23:283. [PMID: 35854211 PMCID: PMC9295499 DOI: 10.1186/s12859-022-04829-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 07/08/2022] [Indexed: 01/25/2023] Open
Abstract
The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep learning method based on multi-head U-Nets (called DistDom) to predict protein domain boundaries utilizing 1D sequence features and predicted 2D inter-residue distance map as input. The 1D features contain the evolutionary and physicochemical information of protein sequences, whereas the 2D distance map includes the structural information of proteins that was rarely used in domain boundary prediction before. The 1D and 2D features are processed by the 1D and 2D U-Nets respectively to generate hidden features. The hidden features are then used by the multi-head attention to predict the probability of each residue of a protein being in a domain boundary, leveraging both local and global information in the features. The residue-level domain boundary predictions can be used to classify proteins as single-domain or multi-domain proteins. It classifies the CASP14 single-domain and multi-domain targets at the accuracy of 75.9%, 13.28% more accurate than the state-of-the-art method. Tested on the CASP14 multi-domain protein targets with expert annotated domain boundaries, the average per-target F1 measure score of the domain boundary prediction by DistDom is 0.263, 29.56% higher than the state-of-the-art method.
Collapse
Affiliation(s)
- Sajid Mahmud
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Zhiye Guo
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Farhan Quadir
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Jian Liu
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| |
Collapse
|
31
|
Mateen RM, Tariq A, Afzal MS, Ali M, Tipu I, Hussain M, Saleem M, Naveed M. TULP3 NLS inhibition: an in silico study to hamper cargo transport to nucleus. J Biomol Struct Dyn 2022:1-9. [PMID: 35510584 DOI: 10.1080/07391102.2022.2070283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
TULP3 is involved in cell regulation pathways including transcription and signal transduction. In some pathological states like in cancers, increased level of TULP3 has been observed so it can serve as a potential target to hamper the activation of those pathways. We propose a novel idea of inhibiting nuclear localization signal (NLS) to interrupt nuclear translocation of TULP3 so that the downstream activations of pathways are blocked. In current in silico study, 3D structure of TULP3 was modeled using 8 different tools including I-TASSER, CABS-FOLD, Phyre2, PSIPRED, RaptorX, Robetta, Rosetta and Prime by Schrödinger. Best structure was selected after quality evaluation by SAVES and implied for the investigation of NLS sequence. Mapped NLS sequence was further used to dock with natural ligand importin-α as control docking to validate the NLS sequence as binding site. After docking and molecular dynamics (MD) simulation validation, these residues were used as binding side for subsequent docking studies. 70 alkaloids were selected after intensive literature survey and were virtually docked with NLS sequence where natural ligand importin-α is supposed to be bound. This study demonstrates the virtual inhibition of NLS sequence so that it paves a way for future in-vivo studies to use NLS as a new drug target for cancer therapeutics.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rana Muhammad Mateen
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Asma Tariq
- School of Biochemistry & Biotechnology, University of the Punjab, Lahore, Pakistan
| | - Muhammad Sohail Afzal
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Muhammad Ali
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Imran Tipu
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Mureed Hussain
- Department of Life sciences, School of Science, University of Management and Technology, Lahore, Pakistan
| | - Mahjabeen Saleem
- School of Biochemistry & Biotechnology, University of the Punjab, Lahore, Pakistan
| | - Muhammad Naveed
- Department of Life Sciences, University of Central Punjab, Lahore, Pakistan
| |
Collapse
|
32
|
Li B, Jin B, Capra JA, Bush WS. Integration of Protein Structure and Population-Scale DNA Sequence Data for Disease Gene Discovery and Variant Interpretation. Annu Rev Biomed Data Sci 2022; 5:141-161. [PMID: 35508071 DOI: 10.1146/annurev-biodatasci-122220-112147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integrate these data sources will play increasingly important roles in disease gene discovery and variant interpretation. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Bian Li
- Department of Biological Sciences and Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, USA
| | - Bowen Jin
- Graduate Program in Systems Biology and Bioinformatics, Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - John A Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
| | - William S Bush
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA;
| |
Collapse
|
33
|
Guo SS, Liu J, Zhou XG, Zhang GJ. DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning. Bioinformatics 2022; 38:1895-1903. [PMID: 35134108 DOI: 10.1093/bioinformatics/btac056] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/26/2021] [Accepted: 01/27/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Protein model quality assessment is a key component of protein structure prediction. In recent research, the voxelization feature was used to characterize the local structural information of residues, but it may be insufficient for describing residue-level topological information. Design features that can further reflect residue-level topology when combined with deep learning methods are therefore crucial to improve the performance of model quality assessment. RESULTS We developed a deep-learning method, DeepUMQA, based on Ultrafast Shape Recognition (USR) for the residue-level single-model quality assessment. In the framework of the deep residual neural network, the residue-level USR feature was introduced to describe the topological relationship between the residue and overall structure by calculating the first moment of a set of residue distance sets and then combined with 1D, 2D and voxelization features to assess the quality of the model. Experimental results on the CASP13, CASP14 test datasets and CAMEO blind test show that USR could supplement the voxelization features to comprehensively characterize residue structure information and significantly improve model assessment accuracy. The performance of DeepUMQA ranks among the top during the state-of-the-art single-model quality assessment methods, including ProQ2, ProQ3, ProQ3D, Ornate, VoroMQA, ProteinGCN, ResNetQA, QDeep, GraphQA, ModFOLD6, ModFOLD7, ModFOLD8, QMEAN3, QMEANDisCo3 and DeepAccNet. AVAILABILITY AND IMPLEMENTATION The DeepUMQA server is freely available at http://zhanglab-bioinf.com/DeepUMQA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sai-Sai Guo
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
34
|
Hong Y, Lee J, Ko J. A-Prot: protein structure modeling using MSA transformer. BMC Bioinformatics 2022; 23:93. [PMID: 35296230 PMCID: PMC8925138 DOI: 10.1186/s12859-022-04628-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 03/03/2022] [Indexed: 11/18/2022] Open
Abstract
Background The accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary. Results In this study, we propose a new protein 3D structure modeling method, A-Prot, using MSA Transformer, one of the state-of-the-art protein language models. An MSA feature tensor and row attention maps are extracted and converted into 2D residue-residue distance and dihedral angle predictions for a given MSA. We demonstrated that A-Prot predicts long-range contacts better than the existing methods. Additionally, we modeled the 3D structures of the free modeling and hard template-based modeling targets of CASP14. The assessment shows that the A-Prot models are more accurate than most top server groups of CASP14. Conclusion These results imply that A-Prot accurately captures the evolutionary and structural information of proteins with relatively low computational cost. Thus, A-Prot can provide a clue for the development of other protein property prediction methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04628-8.
Collapse
Affiliation(s)
- Yiyu Hong
- Arontier Co, Seoul, Republic of Korea
| | - Juyong Lee
- Arontier Co, Seoul, Republic of Korea. .,Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, Republic of Korea.
| | - Junsu Ko
- Arontier Co, Seoul, Republic of Korea
| |
Collapse
|
35
|
Nallasamy V, Seshiah M. Protein Structure Prediction Using Quantile Dragonfly and Structural Class-Based Deep Learning. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s021800142250015x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Predicting three-dimensional structure of a protein in the field of computational molecular biology has received greater attention. Most of the recent research works aimed at exploring search space, however with the increasing nature and size of data, protein structure identification and prediction are still in the preliminary stage. This work is aimed at exploring search space to tackle protein structure prediction with minimum execution time and maximum accuracy by means of quantile regressive dragonfly and structural class homolog-based deep learning (QRD-SCHDL). The proposed QRD-SCHDL method consists of two distinct steps. They are protein structure identification and prediction. In the first step, protein structure identification is performed by means of QRD optimization model to identify protein structure with minimum error. Here the protein structure identification is first performed as the raw database contains sequence information and does not contain structural information. An optimization model is designed to obtain the structural information from the database. However, protein structure gives much more insight than its sequence. Therefore, to perform computational prediction of protein structure from its sequence, actual protein structure prediction is made. The second step involves the actual protein structure prediction via structural class and homolog-based deep learning. For each protein structure prediction, a scoring matrix is obtained by utilizing structural class maximum correlation coefficient. Finally, the proposed method is tested on a set of different unique numbers of protein data and compared to the state-of-the-art methods. The obtained results showed the potentiality of the proposed method in terms of metrics, error rate, protein structure prediction time, protein structure prediction accuracy, precision, specificity, recall, ROC, Kappa coefficient and [Formula: see text]-measure, respectively. It also shows that the proposed QRD-SCHDL method attains comparable results and outperformed in certain cases, thereby signifying the efficiency of the proposed work.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Department of Computer Science, Periyar University, Salem-636011, Tamil Nadu, India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram-637401, Namakkal, Tamil Nadu, India
| |
Collapse
|
36
|
Zhou X, Song H, Li J. Residue-Frustration-Based Prediction of Protein-Protein Interactions Using Machine Learning. J Phys Chem B 2022; 126:1719-1727. [PMID: 35170967 DOI: 10.1021/acs.jpcb.1c10525] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The study of protein-protein interactions (PPIs) is important in understanding the function of proteins. However, it is still a challenge to investigate the transient protein-protein interaction by experiments. Hence, the computational prediction for protein-protein interactions draws growing attention. Statistics-based features have been widely used in the studies of protein structure prediction and protein folding. Due to the scarcity of experimental data of PPI, it is difficult to construct a conventional statistical feature for PPI prediction, and the application of statistics-based features is very limited in this field. In this paper, we explored the application of frustration, a statistical potential, in PPI prediction. By comparing the energetic contribution of the extra stabilization energy from a given residue pair in the native protein with the statistics of the energies, we obtained the residue pair's frustration index. By calculating the number of residue pairs with a high frustration index, the highly frustrated density, a residue-frustration-based feature, was then obtained to describe the tendency of residues to be involved in PPI. Highly frustrated density, as well as structure-based features, were then used to describe protein residues and combined with the long short-term memory (LSTM) neural network to predict PPI residue pairs. Our model correctly predicted 75% dimers when only the top 2‰ residue pairs were selected in each dimer. Our model, which considers the statistics-based features, is significantly different from the models based on the chemical features of residues. We found that frustration can effectively describe the tendency of residue to be involved in PPI. Frustration-based features can replace chemical features to combine with machine learning and realize the better performance of PPI prediction. It reveals the great potential of statistical potential such as frustration in PPI prediction.
Collapse
Affiliation(s)
- Xiaozhou Zhou
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Haoyu Song
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Jingyuan Li
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| |
Collapse
|
37
|
Pražnikar J, Attygalle NT. Quantitative analysis of visual codewords of a protein distance matrix. PLoS One 2022; 17:e0263566. [PMID: 35120181 PMCID: PMC8815937 DOI: 10.1371/journal.pone.0263566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/24/2022] [Indexed: 12/02/2022] Open
Abstract
3D protein structures can be analyzed using a distance matrix calculated as the pairwise distance between all Cα atoms in the protein model. Although researchers have efficiently used distance matrices to classify proteins and find homologous proteins, much less work has been done on quantitative analysis of distance matrix features. Therefore, the distance matrix was analyzed as gray scale image using KAZE feature extractor algorithm with Bag of Visual Words model. In this study, each protein was represented as a histogram of visual codewords. The analysis showed that a very small number of codewords (~1%) have a high relative frequency (> 0.25) and that the majority of codewords have a relative frequency around 0.05. We have also shown that there is a relationship between the frequency of codewords and the position of the features in a distance matrix. The codewords that are more frequent are located closer to the main diagonal. Less frequent codewords, on the other hand, are located in the corners of the distance matrix, far from the main diagonal. Moreover, the analysis showed a correlation between the number of unique codewords and the 3D repeats in the protein structure. The solenoid and tandem repeats proteins have a significantly lower number of unique codewords than the globular proteins. Finally, the codeword histograms and Support Vector Machine (SVM) classifier were used to classify solenoid and globular proteins. The result showed that the SVM classifier fed with codeword histograms correctly classified 352 out of 354 proteins.
Collapse
Affiliation(s)
- Jure Pražnikar
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
- Department of Biochemistry, Molecular and Structural Biology, Institute Jožef Stefan, Ljubljana, Slovenia
| | - Nuwan Tharanga Attygalle
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
| |
Collapse
|
38
|
Bhattacharya S, Roche R, Moussad B, Bhattacharya D. DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins. Proteins 2022; 90:579-588. [PMID: 34599831 PMCID: PMC8738102 DOI: 10.1002/prot.26254] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 09/22/2021] [Accepted: 09/28/2021] [Indexed: 02/03/2023]
Abstract
Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact-assisted or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches, and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available at https://github.com/Bhattacharya-Lab/DisCovER.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
39
|
Du Z, Peng Z, Yang J. Toward the assessment of predicted inter-residue distance. Bioinformatics 2022; 38:962-969. [PMID: 34791040 DOI: 10.1093/bioinformatics/btab781] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 10/07/2021] [Accepted: 11/10/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Significant progress has been achieved in distance-based protein folding, due to improved prediction of inter-residue distance by deep learning. Many efforts are thus made to improve distance prediction in recent years. However, it remains unknown what is the best way of objectively assessing the accuracy of predicted distance. RESULTS A total of 19 metrics were proposed to measure the accuracy of predicted distance. These metrics were discussed and compared quantitatively on three benchmark datasets, with distance and structure models predicted by the trRosetta pipeline. The experiments show that a few metrics, such as distance precision, have a high correlation with the model accuracy measure TM-score (Pearson's correlation coefficient >0.7). In addition, the metrics are applied to rank the distance prediction groups in CASP14. The ranking by our metrics coincides largely with the official version. These data suggest that the proposed metrics are effective for measuring distance prediction. We anticipate that this study paves the way for objectively monitoring the progress of inter-residue distance prediction. A web server and a standalone package are provided to implement the proposed metrics. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/APD. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zongyang Du
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
40
|
Liu J, Zhao KL, He GX, Wang LJ, Zhou XG, Zhang GJ. A de novo protein structure prediction by iterative partition sampling, topology adjustment and residue-level distance deviation optimization. Bioinformatics 2021; 38:99-107. [PMID: 34459867 DOI: 10.1093/bioinformatics/btab620] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 07/23/2021] [Accepted: 08/25/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the great progress of deep learning-based inter-residue contact/distance prediction, the discrete space formed by fragment assembly cannot satisfy the distance constraint well. Thus, the optimal solution of the continuous space may not be achieved. Designing an effective closed-loop continuous dihedral angle optimization strategy that complements the discrete fragment assembly is crucial to improve the performance of the distance-assisted fragment assembly method. RESULTS In this article, we proposed a de novo protein structure prediction method called IPTDFold based on closed-loop iterative partition sampling, topology adjustment and residue-level distance deviation optimization. First, local dihedral angle crossover and mutation operators are designed to explore the conformational space extensively and achieve information exchange between the conformations in the population. Then, the dihedral angle rotation model of loop region with partial inter-residue distance constraints is constructed, and the rotation angle satisfying the constraints is obtained by differential evolution algorithm, so as to adjust the spatial position relationship between the secondary structures. Finally, the residue distance deviation is evaluated according to the difference between the conformation and the predicted distance, and the dihedral angle of the residue is optimized with biased probability. The final model is generated by iterating the above three steps. IPTDFold is tested on 462 benchmark proteins, 24 FM targets of CASP13 and 20 FM targets of CASP14. Results show that IPTDFold is significantly superior to the distance-assisted fragment assembly method Rosetta_D (Rosetta with distance). In particular, the prediction accuracy of IPTDFold does not decrease as the length of the protein increases. When using the same FastRelax protocol, the prediction accuracy of IPTDFold is significantly superior to that of trRosetta without orientation constraints, and is equivalent to that of the full version of trRosetta. AVAILABILITYAND IMPLEMENTATION The source code and executable are freely available at https://github.com/iobio-zjut/IPTDFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guang-Xing He
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Liu-Jing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
41
|
Hou M, Peng C, Zhou X, Zhang B, Zhang G. Multi contact-based folding method for de novo protein structure prediction. Brief Bioinform 2021; 23:6445108. [PMID: 34849573 DOI: 10.1093/bib/bbab463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/21/2021] [Accepted: 10/10/2021] [Indexed: 11/12/2022] Open
Abstract
Meta contact, which combines different contact maps into one to improve contact prediction accuracy and effectively reduce the noise from a single contact map, is a widely used method. However, protein structure prediction using meta contact cannot fully exploit the information carried by original contact maps. In this work, a multi contact-based folding method under the evolutionary algorithm framework, MultiCFold, is proposed. In MultiCFold, the thorough information of different contact maps is directly used by populations to guide protein structure folding. In addition, noncontact is considered as an effective supplement to contact information and can further assist protein folding. MultiCFold is tested on a set of 120 nonredundant proteins, and the average TM-score and average RMSD reach 0.617 and 5.815 Å, respectively. Compared with the meta contact-based method, MetaCFold, average TM-score and average RMSD have a 6.62 and 8.82% improvement. In particular, the import of noncontact information increases the average TM-score by 6.30%. Furthermore, MultiCFold is compared with four state-of-the-art methods of CASP13 on the 24 FM targets, and results show that MultiCFold is significantly better than other methods after the full-atom relax procedure.
Collapse
Affiliation(s)
- Minghua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Chunxiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Hangzhou 310023, China
| | - Biao Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
42
|
Evaluation of Deep Neural Network ProSPr for Accurate Protein Distance Predictions on CASP14 Targets. Int J Mol Sci 2021; 22:ijms222312835. [PMID: 34884640 PMCID: PMC8657919 DOI: 10.3390/ijms222312835] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/22/2021] [Accepted: 11/25/2021] [Indexed: 12/02/2022] Open
Abstract
The field of protein structure prediction has recently been revolutionized through the introduction of deep learning. The current state-of-the-art tool AlphaFold2 can predict highly accurate structures; however, it has a prohibitively long inference time for applications that require the folding of hundreds of sequences. The prediction of protein structure annotations, such as amino acid distances, can be achieved at a higher speed with existing tools, such as the ProSPr network. Here, we report on important updates to the ProSPr network, its performance in the recent Critical Assessment of Techniques for Protein Structure Prediction (CASP14) competition, and an evaluation of its accuracy dependency on sequence length and multiple sequence alignment depth. We also provide a detailed description of the architecture and the training process, accompanied by reusable code. This work is anticipated to provide a solid foundation for the further development of protein distance prediction tools.
Collapse
|
43
|
Adolfi MC, Du K, Kneitz S, Cabau C, Zahm M, Klopp C, Feron R, Paixão RV, Varela ES, de Almeida FL, de Oliveira MA, Nóbrega RH, Lopez-Roques C, Iampietro C, Lluch J, Kloas W, Wuertz S, Schaefer F, Stöck M, Guiguen Y, Schartl M. A duplicated copy of id2b is an unusual sex-determining candidate gene on the Y chromosome of arapaima (Arapaima gigas). Sci Rep 2021; 11:21544. [PMID: 34732792 PMCID: PMC8566520 DOI: 10.1038/s41598-021-01066-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 10/21/2021] [Indexed: 12/19/2022] Open
Abstract
Arapaima gigas is one of the largest freshwater fish species of high ecological and economic importance. Overfishing and habitat destruction are severe threats to the remaining wild populations. By incorporating a chromosomal Hi-C contact map, we improved the arapaima genome assembly to chromosome-level, revealing an unexpected high degree of chromosome rearrangements during evolution of the bonytongues (Osteoglossiformes). Combining this new assembly with pool-sequencing of male and female genomes, we identified id2bbY, a duplicated copy of the inhibitor of DNA binding 2b (id2b) gene on the Y chromosome as candidate male sex-determining gene. A PCR-test for id2bbY was developed, demonstrating that this gene is a reliable male-specific marker for genotyping. Expression analyses showed that this gene is expressed in juvenile male gonads. Its paralog, id2ba, exhibits a male-biased expression in immature gonads. Transcriptome analyses and protein structure predictions confirm id2bbY as a prime candidate for the master sex-determiner. Acting through the TGFβ signaling pathway, id2bbY from arapaima would provide the first evidence for a link of this family of transcriptional regulators to sex determination. Our study broadens our current understanding about the evolution of sex determination genetic networks and provide a tool for improving arapaima aquaculture for commercial and conservation purposes.
Collapse
Affiliation(s)
- Mateus C Adolfi
- Developmental Biochemistry, Biocenter, University of Wuerzburg, Am Hubland, 97074, Wuerzburg, Germany.
| | - Kang Du
- Developmental Biochemistry, Biocenter, University of Wuerzburg, Am Hubland, 97074, Wuerzburg, Germany
- The Xiphophorus Genetic Stock Center, Department of Chemistry and Biochemistry, Texas State University, San Marcos, Texas, TX, 78666, USA
| | - Susanne Kneitz
- Biochemistry and Cell Biology, Biocenter, University of Wuerzburg, Am Hubland, 97074, Wuerzburg, Germany
| | - Cédric Cabau
- Sigenae, GenPhySE, INRAE, ENVT, Université de Toulouse, Castanet Tolosan, France
| | - Margot Zahm
- Sigenae, GenPhySE, INRAE, ENVT, Université de Toulouse, Castanet Tolosan, France
| | - Christophe Klopp
- MIAT, INRA, Université de Toulouse, Chemin de Borde Rouge, 31326, Castanet-Tolosan Cedex, France
| | - Romain Feron
- INRAE, LPGP, Rennes, France
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | | | - Marcos A de Oliveira
- Reproductive and Molecular Biology Group, Department of Morphology, Institute of Biosciences, UNESP, Botucatu, Brazil
| | - Rafael H Nóbrega
- Reproductive and Molecular Biology Group, Department of Morphology, Institute of Biosciences, UNESP, Botucatu, Brazil
| | | | | | - Jérôme Lluch
- GeT-PlaGe, INRAE, Genotoul, Castanet-Tolosan, France
| | - Werner Kloas
- Leibniz-Institute of Freshwater Ecology and Inland Fisheries, IGB, Müggelseedamm 301 & 310, 12587, Berlin, Germany
| | - Sven Wuertz
- Leibniz-Institute of Freshwater Ecology and Inland Fisheries, IGB, Müggelseedamm 301 & 310, 12587, Berlin, Germany
| | - Fabian Schaefer
- Leibniz-Institute of Freshwater Ecology and Inland Fisheries, IGB, Müggelseedamm 301 & 310, 12587, Berlin, Germany
| | - Matthias Stöck
- Leibniz-Institute of Freshwater Ecology and Inland Fisheries, IGB, Müggelseedamm 301 & 310, 12587, Berlin, Germany
- Amphibian Research Center, Hiroshima University, Higashi-Hiroshima, 739-8526, Japan
| | | | - Manfred Schartl
- Developmental Biochemistry, Biocenter, University of Wuerzburg, Am Hubland, 97074, Wuerzburg, Germany
- The Xiphophorus Genetic Stock Center, Department of Chemistry and Biochemistry, Texas State University, San Marcos, Texas, TX, 78666, USA
- Comprehensive Cancer Center Mainfranken, University Hospital, 97080, Würzburg, Germany
| |
Collapse
|
44
|
Defresne M, Barbe S, Schiex T. Protein Design with Deep Learning. Int J Mol Sci 2021; 22:11741. [PMID: 34769173 PMCID: PMC8584038 DOI: 10.3390/ijms222111741] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/23/2021] [Accepted: 10/26/2021] [Indexed: 12/21/2022] Open
Abstract
Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.
Collapse
Affiliation(s)
- Marianne Defresne
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| |
Collapse
|
45
|
Ruiz-Serra V, Pontes C, Milanetti E, Kryshtafovych A, Lepore R, Valencia A. Assessing the accuracy of contact and distance predictions in CASP14. Proteins 2021; 89:1888-1900. [PMID: 34595772 DOI: 10.1002/prot.26248] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/06/2021] [Accepted: 09/21/2021] [Indexed: 12/26/2022]
Abstract
We present the results of the assessment of the intramolecular residue-residue contact and distance predictions from groups participating in the 14th round of the CASP experiment. The performance of contact prediction methods was evaluated with the measures used in previous CASPs, while distance predictions were assessed based on a new protocol, which considers individual distance pairs as well as the whole predicted distance matrix, using a graph-based framework. The results of the evaluation indicate that predictions by the tFold framework, TripletRes and DeepPotential were the most accurate in both categories. With regards to progress in method performance, the results of the assessment in contact prediction did not reveal any discernible difference when compared to CASP13. Arguably, this could be due to CASP14 FM targets being more challenging than ever before.
Collapse
Affiliation(s)
| | - Camila Pontes
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Edoardo Milanetti
- Department of Physics, Sapienza Università di Roma, Rome, Italy.,Center for Life Nano- & Neuro-Science, Fondazione Istituto Italiano di Tecnologia (IIT), Rome, Italy
| | | | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,ICREA, Pg. Lluís Companys, Barcelona, Spain
| |
Collapse
|
46
|
Kong L, Ju F, Zhang H, Sun S, Bu D. FALCON2: a web server for high-quality prediction of protein tertiary structures. BMC Bioinformatics 2021; 22:439. [PMID: 34525939 PMCID: PMC8444573 DOI: 10.1186/s12859-021-04353-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/01/2021] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising. RESULTS In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches. CONCLUSIONS By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.
Collapse
Affiliation(s)
- Lupeng Kong
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Haicang Zhang
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| |
Collapse
|
47
|
Shen T, Wu J, Lan H, Zheng L, Pei J, Wang S, Liu W, Huang J. When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction). Proteins 2021; 89:1901-1910. [PMID: 34473376 DOI: 10.1002/prot.26232] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/16/2021] [Accepted: 08/20/2021] [Indexed: 12/29/2022]
Abstract
In this paper, we report our tFold framework's performance on the inter-residue contact prediction task in the 14th Critical Assessment of protein Structure Prediction (CASP14). Our tFold framework seamlessly combines both homologous sequences and structural decoys under an ultra-deep network architecture. Squeeze-excitation and axial attention mechanisms are employed to effectively capture inter-residue interactions. In CASP14, our best predictor achieves 41.78% in the averaged top-L precision for long-range contacts for all the 22 free-modeling (FM) targets, and ranked 1st among all the 60 participating teams. The tFold web server is now freely available at: https://drug.ai.tencent.com/console/en/tfold.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Wei Liu
- Tencent AI Lab, Shenzhen, China
| | | |
Collapse
|
48
|
Christoffer C, Bharadwaj V, Luu R, Kihara D. LZerD Protein-Protein Docking Webserver Enhanced With de novo Structure Prediction. Front Mol Biosci 2021; 8:724947. [PMID: 34466411 PMCID: PMC8403062 DOI: 10.3389/fmolb.2021.724947] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 07/21/2021] [Indexed: 01/25/2023] Open
Abstract
Protein-protein docking is a useful tool for modeling the structures of protein complexes that have yet to be experimentally determined. Understanding the structures of protein complexes is a key component for formulating hypotheses in biophysics regarding the functional mechanisms of complexes. Protein-protein docking is an established technique for cases where the structures of the subunits have been determined. While the number of known structures deposited in the Protein Data Bank is increasing, there are still many cases where the structures of individual proteins that users want to dock are not determined yet. Here, we have integrated the AttentiveDist method for protein structure prediction into our LZerD webserver for protein-protein docking, which enables users to simply submit protein sequences and obtain full-complex atomic models, without having to supply any structure themselves. We have further extended the LZerD docking interface with a symmetrical homodimer mode. The LZerD server is available at https://lzerd.kiharalab.org/.
Collapse
Affiliation(s)
- Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Vijay Bharadwaj
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Ryan Luu
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States.,Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
49
|
Mortuza SM, Zheng W, Zhang C, Li Y, Pearce R, Zhang Y. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nat Commun 2021; 12:5011. [PMID: 34408149 PMCID: PMC8373938 DOI: 10.1038/s41467-021-25316-w] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 08/04/2021] [Indexed: 11/28/2022] Open
Abstract
Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.
Collapse
Affiliation(s)
- S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
50
|
Wang L, Liu J, Xia Y, Xu J, Zhou X, Zhang G. Distance-guided protein folding based on generalized descent direction. Brief Bioinform 2021; 22:6341661. [PMID: 34355233 DOI: 10.1093/bib/bbab296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/30/2021] [Accepted: 07/12/2021] [Indexed: 12/25/2022] Open
Abstract
Advances in the prediction of the inter-residue distance for a protein sequence have increased the accuracy to predict the correct folds of proteins with distance information. Here, we propose a distance-guided protein folding algorithm based on generalized descent direction, named GDDfold, which achieves effective structural perturbation and potential minimization in two stages. In the global stage, random-based direction is designed using evolutionary knowledge, which guides conformation population to cross potential barriers and explore conformational space rapidly in a large range. In the local stage, locally rugged potential landscape can be explored with the aid of conjugate-based direction integrated into a specific search strategy, which can improve the exploitation ability. GDDfold is tested on 347 proteins of a benchmark set, 24 template-free modeling (FM) approaches targets of CASP13 and 20 FM targets of CASP14. Results show that GDDfold correctly folds [template modeling (TM) score ≥ = 0.5] 316 out of 347 proteins, where 65 proteins have TM scores that are greater than 0.8, and significantly outperforms Rosetta-dist (distance-assisted fragment assembly method) and L-BFGSfold (distance geometry optimization method). On CASP FM targets, GDDfold is comparable with five state-of-the-art full-version methods, namely, Quark, RaptorX, Rosetta, MULTICOM and trRosetta in the CASP 13 and 14 server groups.
Collapse
Affiliation(s)
- Liujing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jiakang Xu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|