1
|
Zheng Y, Young ND, Wang T, Chang BCH, Song J, Gasser RB. Systems biology of Haemonchus contortus - Advancing biotechnology for parasitic nematode control. Biotechnol Adv 2025; 81:108567. [PMID: 40127743 DOI: 10.1016/j.biotechadv.2025.108567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 03/19/2025] [Accepted: 03/21/2025] [Indexed: 03/26/2025]
Abstract
Parasitic nematodes represent a substantial global burden, impacting animal health, agriculture and economies worldwide. Of these worms, Haemonchus contortus - a blood-feeding nematode of ruminants - is a major pathogen and a model for molecular and applied parasitology research. This review synthesises some key advances in understanding the molecular biology, genetic diversity and host-parasite interactions of H. contortus, highlighting its value for comparative studies with the free-living nematode Caenorhabditis elegans. Key themes include recent developments in genomic, transcriptomic and proteomic technologies and resources, which are illuminating critical molecular pathways, including the ubiquitination pathway, protease/protease inhibitor systems and the secretome of H. contortus. Some of these insights are providing a foundation for identifying essential genes and exploring their potential as targets for novel anthelmintics or vaccines, particularly in the face of widespread anthelmintic resistance. Advanced bioinformatic tools, such as machine learning (ML) algorithms and artificial intelligence (AI)-driven protein structure prediction, are enhancing annotation capabilities, facilitating and accelerating analyses of gene functions, and biological pathways and processes. This review also discusses the integration of these tools with cutting-edge single-cell sequencing and spatial transcriptomics to dissect host-parasite interactions at the cellular level. The discussion emphasises the importance of curated databases, improved culture systems and functional genomics platforms to translate molecular discoveries into practical outcomes, such as novel interventions. New research findings and resources not only advance research on H. contortus and related nematodes but may also pave the way for innovative solutions to the global challenges with anthelmintic resistance.
Collapse
Affiliation(s)
- Yuanting Zheng
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Tao Wang
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Bill C H Chang
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Jiangning Song
- Faculty of IT, Department of Data Science and AI, Monash University, Victoria, Australia; Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Victoria, Australia; Monash Data Futures Institute, Monash University, Victoria, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
2
|
Pan T, Bi Y, Wang X, Zhang Y, Webb GI, Gasser RB, Kurgan L, Song J. SCREEN: A Graph-based Contrastive Learning Tool to Infer Catalytic Residues and Assess Enzyme Mutations. GENOMICS, PROTEOMICS & BIOINFORMATICS 2025; 22:qzae094. [PMID: 39724324 PMCID: PMC11961199 DOI: 10.1093/gpbjnl/qzae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Revised: 12/05/2024] [Accepted: 12/06/2024] [Indexed: 12/28/2024]
Abstract
The accurate identification of catalytic residues contributes to our understanding of enzyme functions in biological processes and pathways. The increasing number of protein sequences necessitates computational tools for the automated prediction of catalytic residues in enzymes. Here, we introduce SCREEN, a graph neural network for the high-throughput prediction of catalytic residues via the integration of enzyme functional and structural information. SCREEN constructs residue representations based on spatial arrangements and incorporates enzyme function priors into such representations through contrastive learning. We demonstrate that SCREEN (1) consistently outperforms currently-available predictors; (2) provides accurate results when applied to inferred enzyme structures; and (3) generalizes well to enzymes dissimilar from those in the training set. We also show that the putative catalytic residues predicted by SCREEN mimic key structural and biophysical characteristics of native catalytic residues. Moreover, using experimental datasets, we show that SCREEN's predictions can be used to distinguish residues with a high mutation tolerance from those likely to cause functional loss when mutated, indicating that this tool might be used to infer disease-associated mutations. SCREEN is publicly available at https://github.com/BioColLab/SCREEN and https://ngdc.cncb.ac.cn/biocode/tool/7580.
Collapse
Affiliation(s)
- Tong Pan
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Biomedicine Discovery Institute-Wenzhou Medical University Alliance in Clinical and Experimental Biomedicine, Monash University, Clayton, VIC 3800, Australia
| | - Yue Bi
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Biomedicine Discovery Institute-Wenzhou Medical University Alliance in Clinical and Experimental Biomedicine, Monash University, Clayton, VIC 3800, Australia
| | - Xiaoyu Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Biomedicine Discovery Institute-Wenzhou Medical University Alliance in Clinical and Experimental Biomedicine, Monash University, Clayton, VIC 3800, Australia
| | - Ying Zhang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Monash University, Clayton, VIC 3800, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Biomedicine Discovery Institute-Wenzhou Medical University Alliance in Clinical and Experimental Biomedicine, Monash University, Clayton, VIC 3800, Australia
- Key Laboratory of Clinical Laboratory Diagnosis and Translational Research of Zhejiang Province, Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325015, China
| |
Collapse
|
3
|
Mirarchi A, Giorgino T, Fabritiis GD. mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics. ARXIV 2024:arXiv:2407.14794v2. [PMID: 39679266 PMCID: PMC11643217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Recent advancements in protein structure determination are revolutionizing our understanding of proteins. Still, a significant gap remains in the availability of comprehensive datasets that focus on the dynamics of proteins, which are crucial for understanding protein function, folding, and interactions. To address this critical gap, we introduce mdCATH, a dataset generated through an extensive set of all-atom molecular dynamics simulations of a diverse and representative collection of protein domains. This dataset comprises all-atom systems for 5,398 domains, modeled with a state-of-the-art classical force field, and simulated in five replicates each at five temperatures from 320 K to 450 K. The mdCATH dataset records coordinates and forces every 1 ns, for over 62 ms of accumulated simulation time, effectively capturing the dynamics of the various classes of domains and providing a unique resource for proteome-wide statistical analyses of protein unfolding thermodynamics and kinetics. We outline the dataset structure and showcase its potential through four easily reproducible case studies, highlighting its capabilities in advancing protein science.
Collapse
Affiliation(s)
- Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Toni Giorgino
- Biophysics Institute, National Research Council (CNR-IBF), Via Celoria 26, Milan, 20133, Italy
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, Barcelona, 08003, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, Barcelona, 08010, Spain
- Acellera Labs, Doctor Trueta 183, Barcelona, 08005, Spain
| |
Collapse
|
4
|
Soleymani F, Paquet E, Viktor HL, Michalowski W. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2779-2797. [PMID: 39050782 PMCID: PMC11268121 DOI: 10.1016/j.csbj.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.
Collapse
Affiliation(s)
- Farzan Soleymani
- Telfer School of Management, University of Ottawa, ON, K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | | |
Collapse
|
5
|
Mirarchi A, Giorgino T, De Fabritiis G. mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics. Sci Data 2024; 11:1299. [PMID: 39609442 PMCID: PMC11604666 DOI: 10.1038/s41597-024-04140-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 11/15/2024] [Indexed: 11/30/2024] Open
Abstract
Recent advancements in protein structure determination are revolutionizing our understanding of proteins. Still, a significant gap remains in the availability of comprehensive datasets that focus on the dynamics of proteins, which are crucial for understanding protein function, folding, and interactions. To address this critical gap, we introduce mdCATH, a dataset generated through an extensive set of all-atom molecular dynamics simulations of a diverse and representative collection of protein domains. This dataset comprises all-atom systems for 5,398 domains, modeled with a state-of-the-art classical force field, and simulated in five replicates each at five temperatures from 320 K to 450 K. The mdCATH dataset records coordinates and forces every 1 ns, for over 62 ms of accumulated simulation time, effectively capturing the dynamics of the various classes of domains and providing a unique resource for proteome-wide statistical analyses of protein unfolding thermodynamics and kinetics. We outline the dataset structure and showcase its potential through four easily reproducible case studies, highlighting its capabilities in advancing protein science.
Collapse
Affiliation(s)
- Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Toni Giorgino
- Biophysics Institute, National Research Council (CNR-IBF), Via Celoria 26, Milan, 20133, Italy.
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, Barcelona, 08003, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, Barcelona, 08010, Spain.
- Acellera Labs, Doctor Trueta 183, Barcelona, 08005, Spain.
| |
Collapse
|
6
|
Rodrigues AV, Moriarty NW, Kakumanu R, DeGiovanni A, Pereira JH, Gin JW, Chen Y, Baidoo EEK, Petzold CJ, Adams PD. Characterization of lignin-degrading enzyme PmdC, which catalyzes a key step in the synthesis of polymer precursor 2-pyrone-4,6-dicarboxylic acid. J Biol Chem 2024; 300:107736. [PMID: 39222681 PMCID: PMC11489326 DOI: 10.1016/j.jbc.2024.107736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 07/22/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024] Open
Abstract
Pyrone-2,4-dicarboxylic acid (PDC) is a valuable polymer precursor that can be derived from the microbial degradation of lignin. The key enzyme in the microbial production of PDC is 4-carboxy-2-hydroxymuconate-6-semialdehyde (CHMS) dehydrogenase, which acts on the substrate CHMS. We present the crystal structure of CHMS dehydrogenase (PmdC from Comamonas testosteroni) bound to the cofactor NADP, shedding light on its three-dimensional architecture, and revealing residues responsible for binding NADP. Using a combination of structural homology, molecular docking, and quantum chemistry calculations, we have predicted the binding site of CHMS. Key histidine residues in a conserved sequence are identified as crucial for binding the hydroxyl group of CHMS and facilitating dehydrogenation with NADP. Mutating these histidine residues results in a loss of enzyme activity, leading to a proposed model for the enzyme's mechanism. These findings are expected to help guide efforts in protein and metabolic engineering to enhance PDC yields in biological routes to polymer feedstock synthesis.
Collapse
Affiliation(s)
- Andria V Rodrigues
- Joint BioEnergy Institute, Emeryville, California, United States; Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California, United States.
| | - Nigel W Moriarty
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California, United States
| | - Ramu Kakumanu
- Joint BioEnergy Institute, Emeryville, California, United States; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States
| | - Andy DeGiovanni
- Joint BioEnergy Institute, Emeryville, California, United States; Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California, United States
| | - Jose Henrique Pereira
- Joint BioEnergy Institute, Emeryville, California, United States; Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California, United States
| | - Jennifer W Gin
- Joint BioEnergy Institute, Emeryville, California, United States; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States; Department of Energy Agile BioFoundry, Emeryville, California, United States
| | - Yan Chen
- Joint BioEnergy Institute, Emeryville, California, United States; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States; Department of Energy Agile BioFoundry, Emeryville, California, United States
| | - Edward E K Baidoo
- Joint BioEnergy Institute, Emeryville, California, United States; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States
| | - Christopher J Petzold
- Joint BioEnergy Institute, Emeryville, California, United States; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States; Department of Energy Agile BioFoundry, Emeryville, California, United States
| | - Paul D Adams
- Joint BioEnergy Institute, Emeryville, California, United States; Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California, United States; Department of Bioengineering, University of California Berkeley, Berkeley, California, United States.
| |
Collapse
|
7
|
Vodiasova E, Sinchenko A, Khvatkov P, Dolgov S. Genome-Wide Identification, Characterisation, and Evolution of the Transcription Factor WRKY in Grapevine ( Vitis vinifera): New View and Update. Int J Mol Sci 2024; 25:6241. [PMID: 38892428 PMCID: PMC11172563 DOI: 10.3390/ijms25116241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 05/29/2024] [Accepted: 06/03/2024] [Indexed: 06/21/2024] Open
Abstract
WRKYs are a multigenic family of transcription factors that are plant-specific and involved in the regulation of plant development and various stress response processes. However, the evolution of WRKY genes is not fully understood. This family has also been incompletely studied in grapevine, and WRKY genes have been named with different numbers in different studies, leading to great confusion. In this work, 62 Vitis vinifera WRKY genes were identified based on six genomes of different cultivars. All WRKY genes were numbered according to their chromosomal location, and a complete revision of the numbering was performed. Amino acid variability between different cultivars was assessed for the first time and was greater than 5% for some WRKYs. According to the gene structure, all WRKYs could be divided into two groups: more exons/long length and fewer exons/short length. For the first time, some chimeric WRKY genes were found in grapevine, which may play a specific role in the regulation of different processes: VvWRKY17 (an N-terminal signal peptide region followed by a non-cytoplasmic domain) and VvWRKY61 (Frigida-like domain). Five phylogenetic clades A-E were revealed and correlated with the WRKY groups (I, II, III). The evolution of WRKY was studied, and we proposed a WRKY evolution model where there were two dynamic phases of complexity and simplification in the evolution of WRKY.
Collapse
Affiliation(s)
- Ekaterina Vodiasova
- Federal State Funded Institution of Science “The Labor Red Banner Order Nikita Botanical Gardens—National Scientific Center of the RAS”, Nikita, 298648 Yalta, Russia; (A.S.); (P.K.); (S.D.)
- A.O. Kovalevsky Institute of Biology of the Southern Seas of RAS, 299011 Sevastopol, Russia
| | - Anastasiya Sinchenko
- Federal State Funded Institution of Science “The Labor Red Banner Order Nikita Botanical Gardens—National Scientific Center of the RAS”, Nikita, 298648 Yalta, Russia; (A.S.); (P.K.); (S.D.)
| | - Pavel Khvatkov
- Federal State Funded Institution of Science “The Labor Red Banner Order Nikita Botanical Gardens—National Scientific Center of the RAS”, Nikita, 298648 Yalta, Russia; (A.S.); (P.K.); (S.D.)
| | - Sergey Dolgov
- Federal State Funded Institution of Science “The Labor Red Banner Order Nikita Botanical Gardens—National Scientific Center of the RAS”, Nikita, 298648 Yalta, Russia; (A.S.); (P.K.); (S.D.)
- Branch of Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, 142290 Puschino, Russia
| |
Collapse
|
8
|
Barco RA, Merino N, Lam B, Budnik B, Kaplan M, Wu F, Amend JP, Nealson KH, Emerson D. Comparative proteomics of a versatile, marine, iron-oxidizing chemolithoautotroph. Environ Microbiol 2024; 26:e16632. [PMID: 38861374 DOI: 10.1111/1462-2920.16632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/20/2024] [Indexed: 06/13/2024]
Abstract
This study conducted a comparative proteomic analysis to identify potential genetic markers for the biological function of chemolithoautotrophic iron oxidation in the marine bacterium Ghiorsea bivora. To date, this is the only characterized species in the class Zetaproteobacteria that is not an obligate iron-oxidizer, providing a unique opportunity to investigate differential protein expression to identify key genes involved in iron-oxidation at circumneutral pH. Over 1000 proteins were identified under both iron- and hydrogen-oxidizing conditions, with differentially expressed proteins found in both treatments. Notably, a gene cluster upregulated during iron oxidation was identified. This cluster contains genes encoding for cytochromes that share sequence similarity with the known iron-oxidase, Cyc2. Interestingly, these cytochromes, conserved in both Bacteria and Archaea, do not exhibit the typical β-barrel structure of Cyc2. This cluster potentially encodes a biological nanowire-like transmembrane complex containing multiple redox proteins spanning the inner membrane, periplasm, outer membrane, and extracellular space. The upregulation of key genes associated with this complex during iron-oxidizing conditions was confirmed by quantitative reverse transcription-PCR. These findings were further supported by electromicrobiological methods, which demonstrated negative current production by G. bivora in a three-electrode system poised at a cathodic potential. This research provides significant insights into the biological function of chemolithoautotrophic iron oxidation.
Collapse
Affiliation(s)
- Roman A Barco
- Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
- Bigelow Laboratory for Ocean Sciences, East Boothbay, Maine, USA
| | - N Merino
- Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Lawrence Livermore National Lab, Biosciences and Biotechnology Division, Livermore, California, USA
| | - B Lam
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - B Budnik
- Mass Spectrometry and Proteomics Resource Laboratory, Harvard University, Cambridge, Massachusetts, USA
| | - M Kaplan
- Department of Microbiology, University of Chicago, Chicago, Illinois, USA
| | - F Wu
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, Zhejiang, China
| | - J P Amend
- Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - K H Nealson
- Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - D Emerson
- Bigelow Laboratory for Ocean Sciences, East Boothbay, Maine, USA
| |
Collapse
|
9
|
Hu J, Zeng WW, Jia NX, Arif M, Yu DJ, Zhang GJ. Improving DNA-Binding Protein Prediction Using Three-Part Sequence-Order Feature Extraction and a Deep Neural Network Algorithm. J Chem Inf Model 2023; 63:1044-1057. [PMID: 36719781 DOI: 10.1021/acs.jcim.2c00943] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Identification of the DNA-binding protein (DBP) helps dig out information embedded in the DNA-protein interaction, which is significant to understanding the mechanisms of DNA replication, transcription, and repair. Although existing computational methods for predicting the DBPs based on protein sequences have obtained great success, there is still room for improvement since the sequence-order information is not fully mined in these methods. In this study, a new three-part sequence-order feature extraction (called TPSO) strategy is developed to extract more discriminative information from protein sequences for predicting the DBPs. For each query protein, TPSO first divides its primary sequence features into N- and C-terminal fragments and then extracts the numerical pseudo features of three parts including the full sequence and these two fragments, respectively. Based on TPSO, a novel deep learning-based method, called TPSO-DBP, is proposed, which employs the sequence-based single-view features, the bidirectional long short-term memory (BiLSTM) and fully connected (FC) neural networks to learn the DBP prediction model. Empirical outcomes reveal that TPSO-DBP can achieve an accuracy of 87.01%, covering 85.30% of all DBPs, while achieving a Matthew's correlation coefficient value (0.741) that is significantly higher than most existing state-of-the-art DBP prediction methods. Detailed data analyses have indicated that the advantages of TPSO-DBP lie in the utilization of TPSO, which helps extract more concealed prominent patterns, and the deep neural network framework composed of BiLSTM and FC that learns the nonlinear relationships between input features and DBPs. The standalone package and web server of TPSO-DBP are freely available at https://jun-csbio.github.io/TPSO-DBP/.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou310023, China
| | - Wen-Wu Zeng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou310023, China
| | - Ning-Xin Jia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou310023, China
| | - Muhammad Arif
- School of Systems and Technology, Department of Informatics and Systems, University of Management and Technology, Lahore54770, Pakistan
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing210094, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou310023, China
| |
Collapse
|
10
|
Tubiana T, Sillitoe I, Orengo C, Reuter N. Dissecting peripheral protein-membrane interfaces. PLoS Comput Biol 2022; 18:e1010346. [PMID: 36516231 PMCID: PMC9797079 DOI: 10.1371/journal.pcbi.1010346] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 12/28/2022] [Accepted: 11/24/2022] [Indexed: 12/15/2022] Open
Abstract
Peripheral membrane proteins (PMPs) include a wide variety of proteins that have in common to bind transiently to the chemically complex interfacial region of membranes through their interfacial binding site (IBS). In contrast to protein-protein or protein-DNA/RNA interfaces, peripheral protein-membrane interfaces are poorly characterized. We collected a dataset of PMP domains representative of the variety of PMP functions: membrane-targeting domains (Annexin, C1, C2, discoidin C2, PH, PX), enzymes (PLA, PLC/D) and lipid-transfer proteins (START). The dataset contains 1328 experimental structures and 1194 AphaFold models. We mapped the amino acid composition and structural patterns of the IBS of each protein in this dataset, and evaluated which were more likely to be found at the IBS compared to the rest of the domains' accessible surface. In agreement with earlier work we find that about two thirds of the PMPs in the dataset have protruding hydrophobes (Leu, Ile, Phe, Tyr, Trp and Met) at their IBS. The three aromatic amino acids Trp, Tyr and Phe are a hallmark of PMPs IBS regardless of whether they protrude on loops or not. This is also the case for lysines but not arginines suggesting that, unlike for Arg-rich membrane-active peptides, the less membrane-disruptive lysine is preferred in PMPs. Another striking observation was the over-representation of glycines at the IBS of PMPs compared to the rest of their surface, possibly procuring IBS loops a much-needed flexibility to insert in-between membrane lipids. The analysis of the 9 superfamilies revealed amino acid distribution patterns in agreement with their known functions and membrane-binding mechanisms. Besides revealing novel amino acids patterns at protein-membrane interfaces, our work contributes a new PMP dataset and an analysis pipeline that can be further built upon for future studies of PMPs properties, or for developing PMPs prediction tools using for example, machine learning approaches.
Collapse
Affiliation(s)
- Thibault Tubiana
- Department of Chemistry, University of Bergen, Bergen, Norway
- Computational Biology Unit, University of Bergen, Bergen, Norway
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Nathalie Reuter
- Department of Chemistry, University of Bergen, Bergen, Norway
- Computational Biology Unit, University of Bergen, Bergen, Norway
| |
Collapse
|
11
|
Das A, Giri K, Behera RN, Maity S, Ambatipudi K. BoMiProt 2.0: An update of the bovine milk protein database. J Proteomics 2022; 267:104696. [PMID: 35995382 DOI: 10.1016/j.jprot.2022.104696] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 07/28/2022] [Accepted: 08/01/2022] [Indexed: 10/15/2022]
Abstract
Milk is a biofluid with various functions, containing carbohydrates, lipids, proteins, vitamins, and minerals. Owing to its importance and availability of vast proteomics information, our research group designed a database for bovine milk proteins (N = 3159) containing the primary and secondary information called BoMiProt. Due to the gaining interest and intensively published literature in the last three years, BoMiProt has been upgraded with newer identified proteins (N = 7459) from peer-reviewed journals, significantly expanding the database from different milk fractions (e.g., whey, fat globule membranes, and exosomes). Additionally, class, architecture, topology, and homology, structural classification of proteins, known and predicted disorder, predicted transmembrane helices, and structures have been included. Each protein entry in the database is thoroughly cross-referenced, including 1392 BoMiProt defined proteins provided with secondary information, such as protein function, biochemical properties, post-translational modifications, significance in milk, domains, fold, AlphaFold predicted models and crystal structures. The proteome data in the database can be retrieved using several search parameters using protein name, accession IDs, and FASTA sequence. Overall, BoMiProt represents an extensive compilation of newer proteins, including structural, functional, and hierarchical information, to help researchers better understand mammary gland pathophysiology, including their potential application in improving the nutritional quality of dairy products.
Collapse
Affiliation(s)
- Arpita Das
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Kuldeep Giri
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Rama N Behera
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Sudipa Maity
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Kiran Ambatipudi
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India.
| |
Collapse
|
12
|
Rezende PM, Xavier JS, Ascher DB, Fernandes GR, Pires DEV. Evaluating hierarchical machine learning approaches to classify biological databases. Brief Bioinform 2022; 23:6611916. [PMID: 35724625 PMCID: PMC9310517 DOI: 10.1093/bib/bbac216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/29/2022] [Accepted: 05/09/2022] [Indexed: 12/04/2022] Open
Abstract
The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include ‘Local’ approaches considering the hierarchy, building models per level or node, and ‘Global’ hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of ‘Local per Level’ and ‘Local per Node’ approaches with a ‘Global’ approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
Collapse
Affiliation(s)
- Pâmela M Rezende
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Stilingue Inteligência Artificial
| | - Joicymara S Xavier
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland.,Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | | | - Douglas E V Pires
- Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
13
|
Shrestha B, Adhikari B. Scoring protein sequence alignments using deep Learning. Bioinformatics 2022; 38:2988-2995. [PMID: 35385080 DOI: 10.1093/bioinformatics/btac210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 04/01/2022] [Accepted: 04/05/2022] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND A high-quality sequence alignment (SA) is the most important input feature for accurate protein structure prediction. For a protein sequence, there are many methods to generate a SA. However, when given a choice of more than one SA for a protein sequence, there are no methods to predict which SA may lead to more accurate models without actually building the models. In this work, we describe a method to predict the quality of a protein's SA. METHODS We created our own dataset by generating a variety of SAs for a set of 1,351 representative proteins and investigated various deep learning architectures to predict the local distance difference test (lDDT) scores of distance maps predicted with SAs as the input. These lDDT scores serve as indicators of the quality of the SAs. RESULTS Using two independent test datasets consisting of CASP13 and CASP14 targets, we show that our method is effective for scoring and ranking SAs when a pool of SAs is available for a protein sequence. With an example, we further discuss that SA selection using our method can lead to improved structure prediction. AVAILABILITY Code and datasets are available at https://github.com/ba-lab/Alignment-Score/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63132, USA
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63132, USA
| |
Collapse
|
14
|
Tang QY, Kaneko K. Dynamics-Evolution Correspondence in Protein Structures. PHYSICAL REVIEW LETTERS 2021; 127:098103. [PMID: 34506164 DOI: 10.1103/physrevlett.127.098103] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 07/28/2021] [Indexed: 06/13/2023]
Abstract
The genotype-phenotype mapping of proteins is a fundamental question in structural biology. In this Letter, with the analysis of a large dataset of proteins from hundreds of protein families, we quantitatively demonstrate the correlations between the noise-induced protein dynamics and mutation-induced variations of native structures, indicating the dynamics-evolution correspondence of proteins. Based on the investigations of the linear responses of native proteins, the origin of such a correspondence is elucidated. It is essential that the noise- and mutation-induced deformations of the proteins are restricted on a common low-dimensional subspace, as confirmed from the data. These results suggest an evolutionary mechanism of the proteins gaining both dynamical flexibility and evolutionary structural variability.
Collapse
Affiliation(s)
- Qian-Yuan Tang
- Center for Complex Systems Biology, Universal Biology Institute, University of Tokyo, Komaba 3-8-1, Meguro-ku, Tokyo 153-8902, Japan
- Lab for Neural Computation and Adaptation, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Kunihiko Kaneko
- Center for Complex Systems Biology, Universal Biology Institute, University of Tokyo, Komaba 3-8-1, Meguro-ku, Tokyo 153-8902, Japan
| |
Collapse
|
15
|
Mulnaes D, Golchin P, Koenig F, Gohlke H. TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning. J Chem Theory Comput 2021; 17:4599-4613. [PMID: 34161735 DOI: 10.1021/acs.jctc.1c00129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein domains are independent, functional, and stable structural units of proteins. Accurate protein domain boundary prediction plays an important role in understanding protein structure and evolution, as well as for protein structure prediction. Current domain boundary prediction methods differ in terms of boundary definition, methodology, and training databases resulting in disparate performance for different proteins. We developed TopDomain, an exhaustive metapredictor, that uses deep neural networks to combine multisource information from sequence- and homology-based features of over 50 primary predictors. For this purpose, we developed a new domain boundary data set termed the TopDomain data set, in which the true annotations are informed by SCOPe annotations, structural domain parsers, human inspection, and deep learning. We benchmark TopDomain against 2484 targets with 3354 boundaries from the TopDomain test set and achieve F1 scores of 78.4% and 73.8% for multidomain boundary prediction within ±20 residues and ±10 residues of the true boundary, respectively. When examined on targets from CASP11-13 competitions, TopDomain achieves F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly outperforms 15 widely used, state-of-the-art ab initio and homology-based domain boundary predictors. Finally, we implemented TopDomainTMC, which accurately predicts whether domain parsing is necessary for the target protein.
Collapse
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Pegah Golchin
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Filip Koenig
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany.,John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
16
|
Durairaj J, Akdel M, de Ridder D, van Dijk ADJ. Geometricus represents protein structures as shape-mers derived from moment invariants. Bioinformatics 2021; 36:i718-i725. [PMID: 33381814 DOI: 10.1093/bioinformatics/btaa839] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2020] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well. RESULTS We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family. AVAILABILITY AND IMPLEMENTATION Python code available at https://git.wur.nl/durai001/geometricus.
Collapse
Affiliation(s)
| | - Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences
| | | | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences.,Mathematical and Statistical Methods - Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6700AP, The Netherlands
| |
Collapse
|
17
|
Chen C, Liu H, Zabad S, Rivera N, Rowin E, Hassan M, Gomez De Jesus SM, Llinás Santos PS, Kravchenko K, Mikhova M, Ketterer S, Shen A, Shen S, Navas E, Horan B, Raudsepp J, Jeffery C. MoonProt 3.0: an update of the moonlighting proteins database. Nucleic Acids Res 2021; 49:D368-D372. [PMID: 33245761 PMCID: PMC7778978 DOI: 10.1093/nar/gkaa1101] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/21/2020] [Accepted: 10/31/2020] [Indexed: 01/09/2023] Open
Abstract
MoonProt 3.0 (http://moonlightingproteins.org) is an updated open-access database storing expert-curated annotations for moonlighting proteins. Moonlighting proteins have two or more physiologically relevant distinct biochemical or biophysical functions performed by a single polypeptide chain. Here, we describe an expansion in the database since our previous report in the Database Issue of Nucleic Acids Research in 2018. For this release, the number of proteins annotated has been expanded to over 500 proteins and dozens of protein annotations have been updated with additional information, including more structures in the Protein Data Bank, compared with version 2.0. The new entries include more examples from humans, plants and archaea, more proteins involved in disease and proteins with different combinations of functions. More kinds of information about the proteins and the species in which they have multiple functions has been added, including CATH and SCOP classification of structure, known and predicted disorder, predicted transmembrane helices, type of organism, relationship of the protein to disease, and relationship of organism to cause of disease.
Collapse
Affiliation(s)
- Chang Chen
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA.,Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Haipeng Liu
- Center for Biomolecular Sciences, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Shadi Zabad
- Department of Computer Science, McGill University, Montreal, QC, Canada
| | - Nina Rivera
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Emily Rowin
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Maheen Hassan
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | | | | | - Karyna Kravchenko
- Department of Biotechnology and Bioengineering, V. N. Karazin Kharkiv National University, IL 61002, Ukraine
| | | | - Sophia Ketterer
- Cold Spring Harbor High School, Cold Spring Harbor, NY 11724, USA
| | - Annabel Shen
- Cold Spring Harbor High School, Cold Spring Harbor, NY 11724, USA
| | - Sophia Shen
- Cold Spring Harbor High School, Cold Spring Harbor, NY 11724, USA
| | - Erin Navas
- Northport High School, Northport, NY 11768, USA
| | - Bryan Horan
- Northport High School, Northport, NY 11768, USA
| | - Jaak Raudsepp
- Cold Spring Harbor High School, Cold Spring Harbor, NY 11724, USA
| | - Constance Jeffery
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA.,Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| |
Collapse
|
18
|
Sillitoe I, Bordin N, Dawson N, Waman VP, Ashford P, Scholes HM, Pang CSM, Woodridge L, Rauer C, Sen N, Abbasian M, Le Cornu S, Lam SD, Berka K, Varekova I, Svobodova R, Lees J, Orengo CA. CATH: increased structural coverage of functional space. Nucleic Acids Res 2021; 49:D266-D273. [PMID: 33237325 PMCID: PMC7778904 DOI: 10.1093/nar/gkaa1079] [Citation(s) in RCA: 265] [Impact Index Per Article: 66.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 10/20/2020] [Accepted: 11/02/2020] [Indexed: 12/11/2022] Open
Abstract
CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Natalie Dawson
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Paul Ashford
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Camilla S M Pang
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Laurel Woodridge
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Sean Le Cornu
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Su Datt Lam
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor 43600, Malaysia
| | - Karel Berka
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacký University Olomouc, Olomouc 771 46, Czech Republic
| | - Ivana Hutařová Varekova
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno 602 00, Czech Republic
| | - Radka Svobodova
- Central European Institute of Technology, Masaryk University, Brno 625 00, Czech Republic| National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno 602 00, Czech Republic
| | - Jon Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford OX3 0BP, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
19
|
Angles R, Arenas-Salinas M, García R, Reyes-Suarez JA, Pohl E. GSP4PDB: a web tool to visualize, search and explore protein-ligand structural patterns. BMC Bioinformatics 2020; 21:85. [PMID: 32164553 PMCID: PMC7068854 DOI: 10.1186/s12859-020-3352-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND In the field of protein engineering and biotechnology, the discovery and characterization of structural patterns is highly relevant as these patterns can give fundamental insights into protein-ligand interaction and protein function. This paper presents GSP4PDB, a bioinformatics web tool that enables the user to visualize, search and explore protein-ligand structural patterns within the entire Protein Data Bank. RESULTS We introduce the notion of graph-based structural pattern (GSP) as an abstract model for representing protein-ligand interactions. A GSP is a graph where the nodes represent entities of the protein-ligand complex (amino acids and ligands) and the edges represent structural relationships (e.g. distances ligand - amino acid). The novel feature of GSP4PDB is a simple and intuitive graphical interface where the user can "draw" a GSP and execute its search in a relational database containing the structural data of each PDB entry. The results of the search are displayed using the same graph-based representation of the pattern. The user can further explore and analyse the results using a wide range of filters, or download their related information for external post-processing and analysis. CONCLUSIONS GSP4PDB is a user-friendly and efficient application to search and discover new patterns of protein-ligand interaction.
Collapse
Affiliation(s)
- Renzo Angles
- Department of Computer Science, Universidad de Talca, Camino Los Niches Km 1, Curicó, Chile
- Millennium Institute for Foundational Research on Data, Santiago, Chile
| | - Mauricio Arenas-Salinas
- Centro de Bioinformática y Simulación Molecular, Universidad de Talca, Talca, Chile
- Faculty of Engineering, Universidad de Talca, Camino Los Niches Km 1, Curicó, Chile
| | - Roberto García
- Millennium Institute for Foundational Research on Data, Santiago, Chile
- Faculty of Engineering, Universidad de Talca, Camino Los Niches Km 1, Curicó, Chile
| | - Jose Antonio Reyes-Suarez
- Centro de Bioinformática y Simulación Molecular, Universidad de Talca, Talca, Chile
- Faculty of Engineering, Universidad de Talca, Camino Los Niches Km 1, Curicó, Chile
| | - Ehmke Pohl
- Department of Chemistry, Durham University, Durham, DH1 3LE United Kingdom
- Department of Biosciences, Durham University, Durham, DH1 3LE United Kingdom
| |
Collapse
|
20
|
Waman VP, Blundell TL, Buchan DWA, Gough J, Jones D, Kelley L, Murzin A, Pandurangan AP, Sillitoe I, Sternberg M, Torres P, Orengo C. The Genome3D Consortium for Structural Annotations of Selected Model Organisms. Methods Mol Biol 2020; 2165:27-67. [PMID: 32621218 DOI: 10.1007/978-1-0716-0708-4_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Genome3D consortium is a collaborative project involving protein structure prediction and annotation resources developed by six world-leading structural bioinformatics groups, based in the United Kingdom (namely Blundell, Murzin, Gough, Sternberg, Orengo, and Jones). The main objective of Genome3D serves as a common portal to provide both predicted models and annotations of proteins in model organisms, using several resources developed by these labs such as CATH-Gene3D, DOMSERF, pDomTHREADER, PHYRE, SUPERFAMILY, FUGUE/TOCATTA, and VIVACE. These resources primarily use SCOP- and/or CATH-based protein domain assignments. Another objective of Genome3D is to compare structural classifications of protein domains in CATH and SCOP databases and to provide a consensus mapping of CATH and SCOP protein superfamilies. CATH/SCOP mapping analyses led to the identification of total of 1429 consensus superfamilies.Currently, Genome3D provides structural annotations for ten model organisms, including Homo sapiens, Arabidopsis thaliana, Mus musculus, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Plasmodium falciparum, Staphylococcus aureus, and Schizosaccharomyces pombe. Thus, Genome3D serves as a common gateway to each structure prediction/annotation resource and allows users to perform comparative assessment of the predictions. It, thus, assists researchers to broaden their perspective on structure/function predictions of their query protein of interest in selected model organisms.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Daniel W A Buchan
- Department of Computer Science, University College London, London, UK
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Cambridge, UK
| | - David Jones
- Department of Computer Science, University College London, London, UK
| | - Lawrence Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | | | | | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Michael Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | - Pedro Torres
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK.
| |
Collapse
|
21
|
Aharoni R, Tobi D. Dynamical comparison between Drosha and Dicer reveals functional motion similarities and dissimilarities. PLoS One 2019; 14:e0226147. [PMID: 31821368 PMCID: PMC6903759 DOI: 10.1371/journal.pone.0226147] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 11/20/2019] [Indexed: 12/01/2022] Open
Abstract
Drosha and Dicer are RNase III family members of classes II and III, respectively, which play a major role in the maturation of micro-RNAs. The two proteins share similar domain arrangement and overall fold despite no apparent sequence homology. The overall structural and catalytic reaction similarity of both proteins, on the one hand, and differences in the substrate and its binding mechanisms, on the other, suggest that both proteins also share dynamic similarities and dissimilarities. Since dynamics is essential for protein function, a comparison at their dynamics level is fundamental for a complete understanding of the overall relations between these proteins. In this study, we present a dynamical comparison between human Drosha and Giardia Dicer. Gaussian Network Model and Anisotropic Network Model modes of motion of the proteins are calculated. Dynamical comparison is performed using global and local dynamic programming algorithms for aligning modes of motion. These algorithms were recently developed based on the commonly used Needleman-Wunsch and Smith-Waterman algorithms for global and local sequence alignment. The slowest mode of Drosha is different from that of Dicer due to its more bended posture and allow the motion of the double-stranded RNA-binding domain toward and away from its substrate. Among the five slowest modes dynamics similarity exists only for the second slow mode of motion of Drosha and Dicer. In addition, high local dynamics similarity is observed at the catalytic domains, in the vicinity of the catalytic residues. The results suggest that the proteins exert a similar catalytic mechanism using similar motions, especially at the catalytic sites.
Collapse
Affiliation(s)
- Rotem Aharoni
- Department of Molecular Biology, Ariel University, Ariel, Israel
| | - Dror Tobi
- Department of Molecular Biology, Ariel University, Ariel, Israel
- Department of Computer Sciences, Ariel University, Ariel, Israel
- * E-mail:
| |
Collapse
|
22
|
Schiano‐di‐Cola C, Kołaczkowski B, Sørensen TH, Christensen SJ, Cavaleiro AM, Windahl MS, Borch K, Morth JP, Westh P. Structural and biochemical characterization of a family 7 highly thermostable endoglucanase from the fungusRasamsonia emersonii. FEBS J 2019; 287:2577-2596. [DOI: 10.1111/febs.15151] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 11/01/2019] [Accepted: 11/20/2019] [Indexed: 01/21/2023]
Affiliation(s)
| | | | - Trine Holst Sørensen
- Department of Science and Environment Roskilde University Denmark
- Novozymes A/S Lyngby Denmark
| | | | | | - Michael Skovbo Windahl
- Department of Science and Environment Roskilde University Denmark
- Novozymes A/S Lyngby Denmark
| | | | - Jens Preben Morth
- Department of Biotechnology and Biomedicine Technical University of Denmark Lyngby Denmark
| | - Peter Westh
- Department of Science and Environment Roskilde University Denmark
- Department of Biotechnology and Biomedicine Technical University of Denmark Lyngby Denmark
| |
Collapse
|
23
|
Sequential, Structural and Functional Properties of Protein Complexes Are Defined by How Folding and Binding Intertwine. J Mol Biol 2019; 431:4408-4428. [DOI: 10.1016/j.jmb.2019.07.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 07/10/2019] [Accepted: 07/29/2019] [Indexed: 12/15/2022]
|
24
|
Kots ED, Khrenova MG, Nemukhin AV, Varfolomeev SD. Aspartoacylase: a central nervous system enzyme. Structure, catalytic activity and regulation mechanisms. RUSSIAN CHEMICAL REVIEWS 2019. [DOI: 10.1070/rcr4842] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
25
|
Abstract
A methodology to cluster proteins based on their dynamics’ similarity is presented. For each pair of proteins from a dataset, the structures are superimposed, and the Anisotropic Network Model modes of motions are calculated. The twelve slowest modes from each protein are matched using a local mode alignment algorithm based on the local sequence alignment algorithm of Smith–Waterman. The dynamical similarity distance matrix is calculated based on the top scoring matches of each pair and the proteins are clustered using a hierarchical clustering algorithm. The utility of this method is exemplified on a dataset of protein chains from the globin family and a dataset of tetrameric hemoglobins. The results demonstrate the effect of the quaternary structure of globin members on their intrinsic dynamics and show good ability to distinguish between different states of hemoglobin, revealing the dynamical relations between them.
Collapse
Affiliation(s)
- Dror Tobi
- Department of Molecular Biology, Ariel University, Ariel, Israel
- Department of Computer Sciences, Ariel University, Ariel, Israel
- * E-mail:
| |
Collapse
|
26
|
Aharoni R, Tobi D. Dynamical comparison between myoglobin and hemoglobin. Proteins 2018; 86:1176-1183. [PMID: 30183107 DOI: 10.1002/prot.25598] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 08/22/2018] [Accepted: 08/31/2018] [Indexed: 01/29/2023]
Abstract
Myoglobin and hemoglobin are globular hemeproteins, when the former is a monomer and the latter a heterotetramer. Despite the structural similarity of myoglobin to α and β subunits of hemoglobin, there is a functional difference between the two proteins, owing to the quaternary structure of hemoglobin. The effect of the quaternary structure of hemoglobin on the intrinsic dynamics of its subunits is explored by dynamical comparison of the two proteins. Anisotropic Network Model modes of motion were calculated for hemoglobin and myoglobin. Dynamical comparison between the proteins was performed using global and local Anisotropic Network Model mode alignment algorithms based on the algorithms of Smith-Waterman and Needleman-Wunsch for sequence comparison. The results indicate that the quaternary structure of hemoglobin substantially alters the intrinsic dynamics of its subunits, an effect that may contribute to the functional difference between the two proteins. Local dynamics similarity between the proteins is still observed at the major exit route of the ligand.
Collapse
Affiliation(s)
- Rotem Aharoni
- Department of Molecular Biology, Ariel University, Ariel, Israel
| | - Dror Tobi
- Department of Molecular Biology, Ariel University, Ariel, Israel.,Department of Computer Sciences, Ariel University, Ariel, Israel
| |
Collapse
|
27
|
Keel BN, Deng B, Moriyama EN. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks. Bioinformatics 2018; 34:1270-1277. [PMID: 29186344 DOI: 10.1093/bioinformatics/btx755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/23/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. Availability and implementation MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. Contact emoriyama2@unl.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Brittney N Keel
- USDA †, ARS, U.S. Meat Animal Research Center, Clay Center, NE 68933, USA.,Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Bo Deng
- Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences and Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
28
|
Carter CW, Wills PR. Interdependence, Reflexivity, Fidelity, Impedance Matching, and the Evolution of Genetic Coding. Mol Biol Evol 2018; 35:269-286. [PMID: 29077934 PMCID: PMC5850816 DOI: 10.1093/molbev/msx265] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Genetic coding is generally thought to have required ribozymes whose functions were taken over by polypeptide aminoacyl-tRNA synthetases (aaRS). Two discoveries about aaRS and their interactions with tRNA substrates now furnish a unifying rationale for the opposite conclusion: that the key processes of the Central Dogma of molecular biology emerged simultaneously and naturally from simple origins in a peptide•RNA partnership, eliminating the epistemological utility of a prior RNA world. First, the two aaRS classes likely arose from opposite strands of the same ancestral gene, implying a simple genetic alphabet. The resulting inversion symmetries in aaRS structural biology would have stabilized the initial and subsequent differentiation of coding specificities, rapidly promoting diversity in the proteome. Second, amino acid physical chemistry maps onto tRNA identity elements, establishing reflexive, nanoenvironmental sensing in protein aaRS. Bootstrapping of increasingly detailed coding is thus intrinsic to polypeptide aaRS, but impossible in an RNA world. These notions underline the following concepts that contradict gradual replacement of ribozymal aaRS by polypeptide aaRS: 1) aaRS enzymes must be interdependent; 2) reflexivity intrinsic to polypeptide aaRS production dynamics promotes bootstrapping; 3) takeover of RNA-catalyzed aminoacylation by enzymes will necessarily degrade specificity; and 4) the Central Dogma's emergence is most probable when replication and translation error rates remain comparable. These characteristics are necessary and sufficient for the essentially de novo emergence of a coupled gene-replicase-translatase system of genetic coding that would have continuously preserved the functional meaning of genetically encoded protein genes whose phylogenetic relationships match those observed today.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Peter R Wills
- Department of Physics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
29
|
Alam N, Goldstein O, Xia B, Porter KA, Kozakov D, Schueler-Furman O. High-resolution global peptide-protein docking using fragments-based PIPER-FlexPepDock. PLoS Comput Biol 2017; 13:e1005905. [PMID: 29281622 PMCID: PMC5760072 DOI: 10.1371/journal.pcbi.1005905] [Citation(s) in RCA: 96] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Revised: 01/09/2018] [Accepted: 11/29/2017] [Indexed: 11/24/2022] Open
Abstract
Peptide-protein interactions contribute a significant fraction of the protein-protein interactome. Accurate modeling of these interactions is challenging due to the vast conformational space associated with interactions of highly flexible peptides with large receptor surfaces. To address this challenge we developed a fragment based high-resolution peptide-protein docking protocol. By streamlining the Rosetta fragment picker for accurate peptide fragment ensemble generation, the PIPER docking algorithm for exhaustive fragment-receptor rigid-body docking and Rosetta FlexPepDock for flexible full-atom refinement of PIPER docked models, we successfully addressed the challenge of accurate and efficient global peptide-protein docking at high-resolution with remarkable accuracy, as validated on a small but representative set of peptide-protein complex structures well resolved by X-ray crystallography. Our approach opens up the way to high-resolution modeling of many more peptide-protein interactions and to the detailed study of peptide-protein association in general. PIPER-FlexPepDock is freely available to the academic community as a server at http://piperfpd.furmanlab.cs.huji.ac.il. Peptide-protein interactions are crucial components of various important biological processes in living cells. High-resolution structural information of such interactions provides insight about the underlying biophysical principles governing the interactions, and a starting point for their targeted manipulations. Accurate docking algorithms can help fill the gap between the vast number of these interactions and the small number of experimentally solved structures. However, the accuracies of the existing protocols have been limited, in particular for ab initio docking when no information about the peptide beyond its sequence is available. Here we introduce PIPER-FlexPepDock, a fragment-based global docking protocol for high-resolution modeling of peptide-protein interactions. Integration of accurate and efficient representation of the peptide using fragment ensembles, their fast and exhaustive rigid-body docking, and their subsequent accurate flexible refinement, enables peptide-protein docking of remarkable accuracy. The validation on a representative benchmark set of crystallographically solved high-resolution peptide-protein complexes demonstrates significantly improved performance over all existing docking protocols. This opens up the way to the modeling of many more peptide-protein interactions, and to a more detailed study of peptide-protein association in general.
Collapse
Affiliation(s)
- Nawsad Alam
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, The Hebrew University, Jerusalem, Israel
| | - Oriel Goldstein
- School of Computer Sciences and Engineering, The Hebrew University, Jerusalem, Israel
| | - Bing Xia
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Kathryn A. Porter
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, United States of America
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America
- Institute for Advanced Computational Sciences, Stony Brook University, Stony Brook, New York, United States of America
- * E-mail: (OSF); (DK)
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, The Hebrew University, Jerusalem, Israel
- * E-mail: (OSF); (DK)
| |
Collapse
|
30
|
Comprehensive Analysis of the Human SH3 Domain Family Reveals a Wide Variety of Non-canonical Specificities. Structure 2017; 25:1598-1610.e3. [DOI: 10.1016/j.str.2017.07.017] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 06/20/2017] [Accepted: 07/28/2017] [Indexed: 01/31/2023]
|
31
|
Bohnuud T, Luo L, Wodak SJ, Vajda S, Bonvin AM, Weng Z, Schueler-Furman O, Kozakov D. A benchmark testing ground for integrating homology modeling and protein docking. Proteins 2017; 85:10-16. [PMID: 27172383 PMCID: PMC5817996 DOI: 10.1002/prot.25063] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 05/08/2016] [Indexed: 12/20/2022]
Abstract
Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. Proteins 2016; 85:10-16. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Tanggis Bohnuud
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Lingqi Luo
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Shoshana J. Wodak
- VIB Structural Biology Research Center, VUB Pleinlaan 2, 1050 Brussels
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Department of Chemistry, Boston University, Boston, MA 02215, USA
| | - Alexandre M.J.J. Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, the Netherlands
| | - Zhiping Weng
- Biochemistry and Molecular Pharmacology University of Massachusetts Medical School Worcester MA United States
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Hadassah Medical School, Hebrew University, Jerusalem, Israel
| | - Dima Kozakov
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Department of Applied Mathematics and Statistics, Stony Brook University NY, USA
| |
Collapse
|
32
|
Chen J, Guo M, Wang X, Liu B. A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief Bioinform 2016; 19:231-244. [DOI: 10.1093/bib/bbw108] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Indexed: 01/02/2023] Open
Affiliation(s)
- Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Mingyue Guo
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
33
|
Comparison of protein repeat classifications based on structure and sequence families. Biochem Soc Trans 2016; 43:832-7. [PMID: 26517890 DOI: 10.1042/bst20150079] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Tandem repeats (TR) in proteins are common in nature and have several unique functions. They come in various forms that are frequently difficult to recognize from a sequence. A previously proposed structural classification has been recently implemented in the RepeatsDB database. This defines five main classes, mainly based on repeat unit length, with subclasses representing specific folds. Sequence-based classifications, such as Pfam, provide an alternative classification based on evolutionarily conserved repeat families. Here, we discuss a detailed comparison between the structural classes in RepeatsDB and the corresponding Pfam repeat families and clans. Most instances are found to map one-to-one between structure and sequence. Some notable exceptions such as leucine-rich repeats (LRRs) and α-solenoids are discussed.
Collapse
|
34
|
Fischer AW, Heinze S, Putnam DK, Li B, Pino JC, Xia Y, Lopez CF, Meiler J. CASP11--An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline. PLoS One 2016; 11:e0152517. [PMID: 27046050 PMCID: PMC4821492 DOI: 10.1371/journal.pone.0152517] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Accepted: 03/15/2016] [Indexed: 11/18/2022] Open
Abstract
In silico prediction of a protein's tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three 'assisted' protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data.
Collapse
Affiliation(s)
- Axel W. Fischer
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Daniel K. Putnam
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - James C. Pino
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Yan Xia
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Carlos F. Lopez
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
- Department of Cancer Biology and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| |
Collapse
|
35
|
Shehu A, Barbará D, Molloy K. A Survey of Computational Methods for Protein Function Prediction. BIG DATA ANALYTICS IN GENOMICS 2016:225-298. [DOI: 10.1007/978-3-319-41279-5_7] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
36
|
Malhotra S, Sowdhamini R. Collation and analyses of DNA-binding protein domain families from sequence and structural databanks. MOLECULAR BIOSYSTEMS 2015; 11:1110-8. [PMID: 25656606 DOI: 10.1039/c4mb00629a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
DNA-protein interactions govern several high fidelity cellular processes like DNA-replication, transcription, DNA repair, etc. Proteins that have the ability to recognise and bind DNA sequences can be classified either according to their DNA-binding motif or based on the sequence of the target nucleotides. We have collated the DNA-binding families by integrating information from both protein sequence family and structural databases. This resulted in a dataset of 1057 DNA-binding protein domain families. Their family properties (the number of members, percent identity distribution and length of members) and domain architectures were examined. Further, sequence domain families were mapped to structures in the protein databank (PDB) and the protein domain structure classification database (SCOP). The DNA-binding families, with no structural information, were clustered together into potential superfamilies based on sequence associations. On the basis of functions attributed to DNA-binding protein folds, we observe that a majority of the DNA-binding proteins follow divergent evolution. This study can serve as a basis for annotation and distribution of DNA-binding proteins in genome(s) of interest. The entire collated set of DNA-binding protein domains is available for download as Hidden Markov Models.
Collapse
Affiliation(s)
- Sony Malhotra
- National Centre for Biological Sciences, Bellary Road, GKVK Campus, Bangalore, India.
| | | |
Collapse
|
37
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|
38
|
Nagarajan D, Deka G, Rao M. Design of symmetric TIM barrel proteins from first principles. BMC BIOCHEMISTRY 2015; 16:18. [PMID: 26264284 PMCID: PMC4531894 DOI: 10.1186/s12858-015-0047-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 07/21/2015] [Indexed: 12/03/2022]
Abstract
Background Computational protein design is a rapidly maturing field within structural biology, with the goal of designing proteins with custom structures and functions. Such proteins could find widespread medical and industrial applications. Here, we have adapted algorithms from the Rosetta software suite to design much larger proteins, based on ideal geometric and topological criteria. Furthermore, we have developed techniques to incorporate symmetry into designed structures. For our first design attempt, we targeted the (α/β)8 TIM barrel scaffold. We gained novel insights into TIM barrel folding mechanisms from studying natural TIM barrel structures, and from analyzing previous TIM barrel design attempts. Methods Computational protein design and analysis was performed using the Rosetta software suite and custom scripts. Genes encoding all designed proteins were synthesized and cloned on the pET20-b vector. Standard circular dichroism and gel chromatographic experiments were performed to determine protein biophysical characteristics. 1D NMR and 2D HSQC experiments were performed to determine protein structural characteristics. Results Extensive protein design simulations coupled with ab initio modeling yielded several all-atom models of ideal, 4-fold symmetric TIM barrels. Four such models were experimentally characterized. The best designed structure (Symmetrin-1) contained a polar, histidine-rich pore, forming an extensive hydrogen bonding network. Symmetrin-1 was easily expressed and readily soluble. It showed circular dichroism spectra characteristic of well-folded alpha/beta proteins. Temperature melting experiments revealed cooperative and reversible unfolding, with a Tm of 44 °C and a Gibbs free energy of unfolding (ΔG°) of 8.0 kJ/mol. Urea denaturing experiments confirmed these observations, revealing a Cm of 1.6 M and a ΔG° of 8.3 kJ/mol. Symmetrin-1 adopted a monomeric conformation, with an apparent molecular weight of 32.12 kDa, and displayed well resolved 1D-NMR spectra. However, the HSQC spectrum revealed somewhat molten characteristics. Conclusions Despite the detection of molten characteristics, the creation of a soluble, cooperatively folding protein represents an advancement over previous attempts at TIM barrel design. Strategies to further improve Symmetrin-1 are elaborated. Our techniques may be used to create other large, internally symmetric proteins. Electronic supplementary material The online version of this article (doi:10.1186/s12858-015-0047-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Deepesh Nagarajan
- Biochemistry Department, Indian Institute of Science, Bangalore, India.
| | - Geeta Deka
- Molecular Biology Unit, Indian Institute of Science, Bangalore, India.
| | - Megha Rao
- Biochemistry Department, Indian Institute of Science, Bangalore, India.
| |
Collapse
|
39
|
Deng L, Chen Z. An Integrated Framework for Functional Annotation of Protein Structural Domains. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:902-13. [PMID: 26357331 DOI: 10.1109/tcbb.2015.2389213] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Structural domains are evolutionary and functional units of proteins and play a critical role in comparative and functional genomics. Computational assignment of domain function with high reliability is essential for understanding whole-protein functions. However, functional annotations are conventionally assigned onto full-length proteins rather than associating specific functions to the individual structural domains. In this article, we present Structural Domain Annotation (SDA), a novel computational approach to predict functions for SCOP structural domains. The SDA method integrates heterogeneous information sources, including structure alignment based protein-SCOP mapping features, InterPro2GO mapping information, PSSM Profiles, and sequence neighborhood features, with a Bayesian network. By large-scale annotating Gene Ontology terms to SCOP domains with SDA, we obtained a database of SCOP domain to Gene Ontology mappings, which contains ~162,000 out of the approximately 166,900 domains in SCOPe 2.03 (>97 percent) and their predicted Gene Ontology functions. We have benchmarked SDA using a single-domain protein dataset and an independent dataset from different species. Comparative studies show that SDA significantly outperforms the existing function prediction methods for structural domains in terms of coverage and maximum F-measure.
Collapse
|
40
|
Carter CW. Urzymology: experimental access to a key transition in the appearance of enzymes. J Biol Chem 2014; 289:30213-30220. [PMID: 25210034 DOI: 10.1074/jbc.r114.567495] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Urzymes are catalysts derived from invariant cores of protein superfamilies. Urzymes from both aminoacyl-tRNA synthetase classes possess sophisticated catalytic mechanisms: pre-steady state bursts, significant transition-state stabilization of both amino acid activation, and tRNA acylation. However, they have insufficient specificity to ensure a fully developed genetic code, suggesting that they participated in synthesizing statistical proteins. They represent a robust experimental platform from which to articulate and test hypotheses both about their own ancestors and about how they, in turn, evolved into modern enzymes. They help reshape numerous paradigms from the RNA World hypothesis to protein structure databases and allostery.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina 27599-7260.
| |
Collapse
|
41
|
Molloy K, Van MJ, Barbara D, Shehu A. Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space. BMC Bioinformatics 2014; 15 Suppl 8:S4. [PMID: 25080993 PMCID: PMC4120149 DOI: 10.1186/1471-2105-15-s8-s4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. METHODS Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. RESULTS We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. CONCLUSIONS This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.
Collapse
Affiliation(s)
- Kevin Molloy
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - M Jennifer Van
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - Daniel Barbara
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
- Department of Bioengineering, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
- School of Systems Biology, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| |
Collapse
|
42
|
Wilburn DB, Bowen KE, Doty KA, Arumugam S, Lane AN, Feldhoff PW, Feldhoff RC. Structural insights into the evolution of a sexy protein: novel topology and restricted backbone flexibility in a hypervariable pheromone from the red-legged salamander, Plethodon shermani. PLoS One 2014; 9:e96975. [PMID: 24849290 PMCID: PMC4029566 DOI: 10.1371/journal.pone.0096975] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Accepted: 04/15/2014] [Indexed: 11/18/2022] Open
Abstract
In response to pervasive sexual selection, protein sex pheromones often display rapid mutation and accelerated evolution of corresponding gene sequences. For proteins, the general dogma is that structure is maintained even as sequence or function may rapidly change. This phenomenon is well exemplified by the three-finger protein (TFP) superfamily: a diverse class of vertebrate proteins co-opted for many biological functions - such as components of snake venoms, regulators of the complement system, and coordinators of amphibian limb regeneration. All of the >200 structurally characterized TFPs adopt the namesake "three-finger" topology. In male red-legged salamanders, the TFP pheromone Plethodontid Modulating Factor (PMF) is a hypervariable protein such that, through extensive gene duplication and pervasive sexual selection, individual male salamanders express more than 30 unique isoforms. However, it remained unclear how this accelerated evolution affected the protein structure of PMF. Using LC/MS-MS and multidimensional NMR, we report the 3D structure of the most abundant PMF isoform, PMF-G. The high resolution structural ensemble revealed a highly modified TFP structure, including a unique disulfide bonding pattern and loss of secondary structure, that define a novel protein topology with greater backbone flexibility in the third peptide finger. Sequence comparison, models of molecular evolution, and homology modeling together support that this flexible third finger is the most rapidly evolving segment of PMF. Combined with PMF sequence hypervariability, this structural flexibility may enhance the plasticity of PMF as a chemical signal by permitting potentially thousands of structural conformers. We propose that the flexible third finger plays a critical role in PMF:receptor interactions. As female receptors co-evolve, this flexibility may allow PMF to still bind its receptor(s) without the immediate need for complementary mutations. Consequently, this unique adaptation may establish new paradigms for how receptor:ligand pairs co-evolve, in particular with respect to sexual conflict.
Collapse
Affiliation(s)
- Damien B. Wilburn
- Department of Biochemistry and Molecular Biology, University of Louisville, Louisville, Kentucky, United States of America
| | - Kathleen E. Bowen
- Department of Biochemistry and Molecular Biology, University of Louisville, Louisville, Kentucky, United States of America
| | - Kari A. Doty
- Department of Biochemistry and Molecular Biology, University of Louisville, Louisville, Kentucky, United States of America
| | - Sengodagounder Arumugam
- J.G. Brown Cancer Center, University of Louisville, Louisville, Kentucky, United States of America
| | - Andrew N. Lane
- J.G. Brown Cancer Center, University of Louisville, Louisville, Kentucky, United States of America
| | - Pamela W. Feldhoff
- Department of Biochemistry and Molecular Biology, University of Louisville, Louisville, Kentucky, United States of America
| | - Richard C. Feldhoff
- Department of Biochemistry and Molecular Biology, University of Louisville, Louisville, Kentucky, United States of America
- * E-mail:
| |
Collapse
|
43
|
|
44
|
Esquivel-Rodríguez J, Kihara D. Computational methods for constructing protein structure models from 3D electron microscopy maps. J Struct Biol 2013; 184:93-102. [PMID: 23796504 DOI: 10.1016/j.jsb.2013.06.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 06/11/2013] [Accepted: 06/13/2013] [Indexed: 12/31/2022]
Abstract
Protein structure determination by cryo-electron microscopy (EM) has made significant progress in the past decades. Resolutions of EM maps have been improving as evidenced by recently reported structures that are solved at high resolutions close to 3Å. Computational methods play a key role in interpreting EM data. Among many computational procedures applied to an EM map to obtain protein structure information, in this article we focus on reviewing computational methods that model protein three-dimensional (3D) structures from a 3D EM density map that is constructed from two-dimensional (2D) maps. The computational methods we discuss range from de novo methods, which identify structural elements in an EM map, to structure fitting methods, where known high resolution structures are fit into a low-resolution EM map. A list of available computational tools is also provided.
Collapse
Affiliation(s)
- Juan Esquivel-Rodríguez
- Department of Computer Science, College of Science, Purdue University, West Lafayette, IN 47907, USA
| | | |
Collapse
|
45
|
Magis C, Di Tommaso P, Notredame C. T-RMSD: a web server for automated fine-grained protein structural classification. Nucleic Acids Res 2013; 41:W358-62. [PMID: 23716642 PMCID: PMC3692075 DOI: 10.1093/nar/gkt383] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd.
Collapse
Affiliation(s)
- Cedrik Magis
- Bioinformatics and Genomics Programme, Centre For Genomic Regulation, Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain
| | | | | |
Collapse
|
46
|
Walsh I, Sirocco FG, Minervini G, Di Domenico T, Ferrari C, Tosatto SCE. RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures. ACTA ACUST UNITED AC 2012; 28:3257-64. [PMID: 22962341 DOI: 10.1093/bioinformatics/bts550] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. Such proteins evolve quickly and their periodicity may be rapidly hidden at sequence level. From a structural point of view, finding solenoids may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. RESULTS Here we introduce RAPHAEL, a novel method for the detection of solenoids in protein structures. It reliably solves three problems of increasing difficulty: (1) recognition of solenoid domains, (2) determination of their periodicity and (3) assignment of insertions. RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. The resulting method is very accurate, with 89.5% of solenoid proteins and 97.2% of non-solenoid proteins correctly classified. RAPHAEL periodicities have a Spearman correlation coefficient of 0.877 against the manually established ones. A baseline algorithm for insertion detection in identified solenoids has a Q(2) value of 79.8%, suggesting room for further improvement. RAPHAEL finds 1931 highly confident repeat structures not previously annotated as solenoids in the Protein Data Bank records.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biology, University of Padua, Viale G. Colombo 3, 35131 Padova, Italy
| | | | | | | | | | | |
Collapse
|
47
|
Rorick M. Quantifying protein modularity and evolvability: a comparison of different techniques. Biosystems 2012; 110:22-33. [PMID: 22796584 DOI: 10.1016/j.biosystems.2012.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Revised: 06/20/2012] [Accepted: 06/27/2012] [Indexed: 10/28/2022]
Abstract
Modularity increases evolvability by reducing constraints on adaptation and by allowing preexisting parts to function in new contexts for novel uses. Protein evolution provides an excellent context to study the causes and consequences of biological modularity. In order to address such questions, however, an index for protein modularity is necessary. This paper proposes a simple index for protein modularity-"module density"-which is the number of evolutionarily independent modules that compose a protein divided by the number of amino acids in the protein. The decomposition of proteins into constituent modules can be accomplished by either of two classes of methods. The first class of methods relies on "suppositional" criteria to assign amino acids to modules, whereas the second class of methods relies on "coevolutionary" criteria for this task. One simple and practical method from the first class consists of approximating the number of modules in a protein as the number of regular secondary structure elements (i.e., helices and sheets). Methods based on coevolutionary criteria require more elaborate data, but they have the advantage of being able to specify modules without prior assumptions about why they exist. Given the increasing availability of datasets sampling protein mutational spectra (e.g., from comparative genomics, experimental evolution, and computational prediction), methods based on coevolutionary criteria will likely become more promising in the near future. The ability to meaningfully quantify protein modularity via simple indices has the potential to aid future efforts to understand protein evolutionary rate determinants, improve molecular evolution models and engineer novel proteins.
Collapse
Affiliation(s)
- Mary Rorick
- University of Michigan, Department of Ecology and Evolutionary Biology, Ann Arbor, MI 48109-1048, United States.
| |
Collapse
|
48
|
Hensen U, Meyer T, Haas J, Rex R, Vriend G, Grubmüller H. Exploring protein dynamics space: the dynasome as the missing link between protein structure and function. PLoS One 2012; 7:e33931. [PMID: 22606222 PMCID: PMC3350514 DOI: 10.1371/journal.pone.0033931] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 02/20/2012] [Indexed: 12/25/2022] Open
Abstract
Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics.
Collapse
Affiliation(s)
- Ulf Hensen
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Tim Meyer
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Jürgen Haas
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - René Rex
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Gert Vriend
- CMBI, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Helmut Grubmüller
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| |
Collapse
|
49
|
Wu CY, Hwa YH, Chen YC, Lim C. Hidden relationship between conserved residues and locally conserved phosphate-binding structures in NAD(P)-binding proteins. J Phys Chem B 2012; 116:5644-52. [PMID: 22530587 DOI: 10.1021/jp3014332] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A one-dimensional (1D) motif usually comprises conserved essential residues involved in catalysis, ligand binding, or maintaining a specific structure. However, it cannot be easily detected in proteins with low sequence identity because it is difficult to (1) identify protein sequences suspected to contain the motif, and (2) align sequences with little sequence identity to spot the conserved residues. Here, we present a strategy for discovering phosphate-binding 1D motifs in NAD(P)-binding proteins sharing low sequence identity that overcomes these two hurdles by determining all distinct locally conserved pyrophosphate-binding structures and aligning the same-length sequences comprising each of these structures to identify the conserved residues. We show that the sequence motifs derived from the distinct pyrophosphate-binding structures yield different numbers/spacing of conserved Gly residues. We also show that they depend on the side chain orientations and cofactor type (NAD or NADP). Thus, sequence motifs derived from local similarity of backbone structures without consideration of the cofactor type and/or side chain orientations would reduce their reliability in annotating protein function from sequence alone. The three-dimensional (3D) and 1D motifs comprising conserved residues in nonredundant proteins reveal hidden relationships between the protein structure/function and sequence as well as protein-cofactor interactions.
Collapse
Affiliation(s)
- Chih Yuan Wu
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan
| | | | | | | |
Collapse
|
50
|
Astakhova TV, Lobanov MN, Poverennaya IV, Roytberg MA, Yacovlev VV. Verification of the PREFAB alignment database. Biophysics (Nagoya-shi) 2012. [DOI: 10.1134/s0006350912020030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|