1
|
Feidakis CP, Krivak R, Hoksza D, Novotny M. AHoJ-DB: A PDB-wide Assignment of apo & holo Relationships Based on Individual Protein-Ligand Interactions. J Mol Biol 2024; 436:168545. [PMID: 38508305 DOI: 10.1016/j.jmb.2024.168545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/12/2024] [Accepted: 03/14/2024] [Indexed: 03/22/2024]
Abstract
A single protein structure is rarely sufficient to capture the conformational variability of a protein. Both bound and unbound (holo and apo) forms of a protein are essential for understanding its geometry and making meaningful comparisons. Nevertheless, docking or drug design studies often still consider only single protein structures in their holo form, which are for the most part rigid. With the recent explosion in the field of structural biology, large, curated datasets are urgently needed. Here, we use a previously developed application (AHoJ) to perform a comprehensive search for apo-holo pairs for 468,293 biologically relevant protein-ligand interactions across 27,983 proteins. In each search, the binding pocket is captured and mapped across existing structures within the same UniProt, and the mapped pockets are annotated as apo or holo, based on the presence or absence of ligands. We assemble the results into a database, AHoJ-DB (www.apoholo.cz/db), that captures the variability of proteins with identical sequences, thereby exposing the agents responsible for the observed differences in geometry. We report several metrics for each annotated pocket, and we also include binding pockets that form at the interface of multiple chains. Analysis of the database shows that about 24% of the binding sites occur at the interface of two or more chains and that less than 50% of the total binding sites processed have an apo form in the PDB. These results can be used to train and evaluate predictors, discover potentially druggable proteins, and reveal protein- and ligand-specific relationships that were previously obscured by intermittent or partial data. Availability: www.apoholo.cz/db.
Collapse
Affiliation(s)
- Christos P Feidakis
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic.
| | - Radoslav Krivak
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague 12116, Czech Republic; Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague 16000, Czech Republic
| | - David Hoksza
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague 12116, Czech Republic
| | - Marian Novotny
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic.
| |
Collapse
|
2
|
Crauwels C, Heidig SL, Díaz A, Vranken WF. Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper. Bioinformatics 2024; 40:btae276. [PMID: 38648741 PMCID: PMC11099654 DOI: 10.1093/bioinformatics/btae276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/20/2024] [Accepted: 04/18/2024] [Indexed: 04/25/2024] Open
Abstract
SUMMARY SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity-based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements. AVAILABILITY AND IMPLEMENTATION The pipeline is implemented using Nextflow, Python3, and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.
Collapse
Affiliation(s)
- Charlotte Crauwels
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
- AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| | - Sophie-Luise Heidig
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium
- AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium
- Evolutionary Biology & Ecology, Université libre de Bruxelles, Brussels, 1050, Belgium
| | - Adrián Díaz
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
- AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
- AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| |
Collapse
|
3
|
Bastolla U, Abia D, Piette O. PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score. Bioinformatics 2023; 39:btad630. [PMID: 37847775 PMCID: PMC10628387 DOI: 10.1093/bioinformatics/btad630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 08/01/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships. RESULTS Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase. AVAILABILITY AND IMPLEMENTATION https://github.com/ugobas/PC_ali.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - David Abia
- Bioinformatics Facility CBMSO, CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - Oscar Piette
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
4
|
Pan A, Zeng Y, Liu J, Zhou M, Lai EC, Yu Y. Unanticipated broad phylogeny of BEN DNA-binding domains revealed by structural homology searches. Curr Biol 2023; 33:2270-2282.e2. [PMID: 37236184 PMCID: PMC10348805 DOI: 10.1016/j.cub.2023.05.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/07/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023]
Abstract
Organization of protein sequences into domain families is a foundation for cataloging and investigating protein functions. However, long-standing strategies based on primary amino acid sequences are blind to the possibility that proteins with dissimilar sequences could have comparable tertiary structures. Building on our recent findings that in silico structural predictions of BEN family DNA-binding domains closely resemble their experimentally determined crystal structures, we exploited the AlphaFold2 database for comprehensive identification of BEN domains. Indeed, we identified numerous novel BEN domains, including members of new subfamilies. For example, while no BEN domain factors had previously been annotated in C. elegans, this species actually encodes multiple BEN proteins. These include key developmental timing genes of orphan domain status, sel-7 and lin-14, the latter being the central target of the founding miRNA lin-4. We also reveal that the domain of unknown function 4806 (DUF4806), which is widely distributed across metazoans, is structurally similar to BEN and comprises a new subtype. Surprisingly, we find that BEN domains resemble both metazoan and non-metazoan homeodomains in 3D conformation and preserve characteristic residues, indicating that despite their inability to be aligned by conventional methods, these DNA-binding modules are probably evolutionarily related. Finally, we broaden the application of structural homology searches by revealing novel human members of DUF3504, which exists on diverse proteins with presumed or known nuclear functions. Overall, our work strongly expands this recently identified family of transcription factors and illustrates the value of 3D structural predictions to annotate protein domains and interpret their functions.
Collapse
Affiliation(s)
- Anyu Pan
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Yangfan Zeng
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Jingjing Liu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Mengjie Zhou
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Eric C Lai
- Developmental Biology Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Yang Yu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China.
| |
Collapse
|
5
|
LeDesma R, Heller B, Biswas A, Maya S, Gili S, Higgins J, Ploss A. Structural features stabilized by divalent cation coordination within hepatitis E virus ORF1 are critical for viral replication. eLife 2023; 12:e80529. [PMID: 36852909 PMCID: PMC9977285 DOI: 10.7554/elife.80529] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 02/12/2023] [Indexed: 03/01/2023] Open
Abstract
Hepatitis E virus (HEV) is an RNA virus responsible for over 20 million infections annually. HEV's open reading frame (ORF)1 polyprotein is essential for genome replication, though it is unknown how the different subdomains function within a structural context. Our data show that ORF1 operates as a multifunctional protein, which is not subject to proteolytic processing. Supporting this model, scanning mutagenesis performed on the putative papain-like cysteine protease (pPCP) domain revealed six cysteines essential for viral replication. Our data are consistent with their role in divalent metal ion coordination, which governs local and interdomain interactions that are critical for the overall structure of ORF1; furthermore, the 'pPCP' domain can only rescue viral genome replication in trans when expressed in the context of the full-length ORF1 protein but not as an individual subdomain. Taken together, our work provides a comprehensive model of the structure and function of HEV ORF1.
Collapse
Affiliation(s)
- Robert LeDesma
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton UniversityPrincetonUnited States
| | - Brigitte Heller
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton UniversityPrincetonUnited States
| | - Abhishek Biswas
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton UniversityPrincetonUnited States
| | - Stephanie Maya
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton UniversityPrincetonUnited States
| | - Stefania Gili
- Department of Geosciences, Princeton UniversityPrincetonUnited States
| | - John Higgins
- Department of Geosciences, Princeton UniversityPrincetonUnited States
| | - Alexander Ploss
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton UniversityPrincetonUnited States
| |
Collapse
|
6
|
Baltzis A, Mansouri L, Jin S, Langer BE, Erb I, Notredame C. Highly significant improvement of protein sequence alignments with AlphaFold2. Bioinformatics 2022; 38:5007-5011. [PMID: 36130276 PMCID: PMC9665868 DOI: 10.1093/bioinformatics/btac625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/29/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Protein sequence alignments are essential to structural, evolutionary and functional analysis, but their accuracy is often limited by sequence similarity unless molecular structures are available. Protein structures predicted at experimental grade accuracy, as achieved by AlphaFold2, could therefore have a major impact on sequence analysis. RESULTS Here, we find that multiple sequence alignments estimated on AlphaFold2 predictions are almost as accurate as alignments estimated on experimental structures and significantly closer to the structural reference than sequence-based alignments. We also show that AlphaFold2 structural models of relatively low quality can be used to obtain highly accurate alignments. These results suggest that, besides structure modeling, AlphaFold2 encodes higher-order dependencies that can be exploited for sequence analysis. AVAILABILITY AND IMPLEMENTATION All data, analyses and results are available on Zenodo (https://doi.org/10.5281/zenodo.7031286). The code and scripts have been deposited in GitHub (https://github.com/cbcrg/msa-af2-nf) and the various containers in (https://cloud.sylabs.io/library/athbaltzis/af2/alphafold, https://hub.docker.com/r/athbaltzis/pred). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Suzanne Jin
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | - Björn E Langer
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | - Ionas Erb
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | | |
Collapse
|
7
|
Calpains as mechanistic drivers and therapeutic targets for ocular disease. Trends Mol Med 2022; 28:644-661. [PMID: 35641420 PMCID: PMC9345745 DOI: 10.1016/j.molmed.2022.05.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/03/2022] [Accepted: 05/09/2022] [Indexed: 11/18/2022]
Abstract
Ophthalmic neurodegenerative diseases encompass a wide array of molecular pathologies unified by calpain dysregulation. Calpains are calcium-dependent proteases that perpetuate cellular death and inflammation when hyperactivated. Calpain inhibition trials in other organs have faced pharmacological challenges, but the eye offers many advantages for the development and testing of targeted molecular therapeutics, including small molecules, peptides, engineered proteins, drug implants, and gene-based therapies. This review highlights structural mechanisms underlying calpain activation, distinct cellular expression patterns, and in vivo models that link calpain hyperactivity to human retinal and developmental disease. Optimizing therapeutic approaches for calpain-mediated eye diseases can help accelerate clinically feasible strategies for treating calpain dysregulation in other diseased tissues.
Collapse
|
8
|
Shegay MV, Švedas VK, Voevodin VV, Suplatov DA, Popova NN. Guide tree optimization with genetic algorithm to improve multiple protein 3D-structure alignment. Bioinformatics 2022; 38:985-989. [PMID: 34849594 DOI: 10.1093/bioinformatics/btab798] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 10/23/2021] [Accepted: 11/19/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the increasing availability of 3D-data, the focus of comparative bioinformatic analysis is shifting from protein sequence alignments toward more content-rich 3D-alignments. This raises the need for new ways to improve the accuracy of 3D-superimposition. RESULTS We proposed guide tree optimization with genetic algorithm (GA) as a universal tool to improve the alignment quality of multiple protein 3D-structures systematically. As a proof of concept, we implemented the suggested GA-based approach in popular Matt and Caretta multiple protein 3D-structure alignment (M3DSA) algorithms, leading to a statistically significant improvement of the TM-score quality indicator by up to 220-1523% on 'SABmark Superfamilies' (in 49-77% of cases) and 'SABmark Twilight' (in 59-80% of cases) datasets. The observed improvement in collections of distant homologies highlights the potentials of GA to optimize 3D-alignments of diverse protein superfamilies as one plausible tool to study the structure-function relationship. AVAILABILITY AND IMPLEMENTATION The source codes of patched gaCaretta and gaMatt programs are available open-access at https://github.com/n-canter/gamaps. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maksim V Shegay
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| | - Vytas K Švedas
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia.,Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| | - Vladimir V Voevodin
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia.,Research Computing Center, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| | - Dmitry A Suplatov
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| | - Nina N Popova
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| |
Collapse
|
9
|
Pramanik SK, Mahmud S, Paul GK, Jabin T, Naher K, Uddin MS, Zaman S, Saleh MA. Fermentation optimization of cellulase production from sugarcane bagasse by Bacillus pseudomycoides and molecular modeling study of cellulase. CURRENT RESEARCH IN MICROBIAL SCIENCES 2021; 2:100013. [PMID: 34841306 PMCID: PMC8610336 DOI: 10.1016/j.crmicr.2020.100013] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 10/01/2020] [Accepted: 10/22/2020] [Indexed: 12/26/2022] Open
Abstract
Isolation of cellulase producing Bacillus pseudomycoides from sugarcane bagasse. Fermentation and optimization of different parameters for cellulase production. Modeling and validation of cellulase enzyme. Interaction dynamics between cellulase and cellulose.
Degradation of cellulosic carbon, the most important natural carbon reservoirs on this planet by cellulase is very essential for valuable soluble sugars. This cellulase has potential biotechnological applications in many industrial sectors. Thus the demand of cellulase is increasing more frequently than ever. Agro industrial byproducts and suitable microbes are of an important source for the production of cellulase. Bacillus pseudomycoides and sugarcane bagasse were used for the production of cellulase and different process parameters influencing the production of cellulase were optimized here. The bacterium showed maximum cellulase production in the presence of sugarcane bagasse, peptone and magnesium sulfate at pH 7, 40 °C in 72 h of incubation. Primary structures of the cellulase is consists of 400 amino acid residues having molecular weight 44,790 Dalton and the theoretical PI is 9.11. Physiochemical properties of cellulase indicated that the protein has instability index 25.77. Seven hydrogen bonds were observed at multiple sites of the cellulase enzyme; His269, Asp237, Asn235, Tyr271, Ser272, Gln309, Asn233. This protein structure may play first hand in further development of exploring cellulase and cellulose interaction dynamics in Bacillus sp. Thus this bacterium may be useful in various industrial applications owing to its cellulase producing capability.
Collapse
|
10
|
Robinson SL, Piel J, Sunagawa S. A roadmap for metagenomic enzyme discovery. Nat Prod Rep 2021; 38:1994-2023. [PMID: 34821235 PMCID: PMC8597712 DOI: 10.1039/d1np00006c] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to 2021Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a 'needle in a haystack' without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare in silico methods for enzyme discovery including phylogenetics, sequence similarity networks, genomic context, 3D structure-based approaches, and machine learning techniques. We also discuss various experimental strategies to test computational predictions including heterologous expression and screening. Finally, we provide an outlook for future directions in the field with an emphasis on meta-omics, single-cell genomics, cell-free expression systems, and sequence-independent methods.
Collapse
Affiliation(s)
| | - Jörn Piel
- Eidgenössische Technische Hochschule (ETH), Zürich, Switzerland.
| | | |
Collapse
|
11
|
Aevarsson A, Kaczorowska AK, Adalsteinsson BT, Ahlqvist J, Al-Karadaghi S, Altenbuchner J, Arsin H, Átlasson ÚÁ, Brandt D, Cichowicz-Cieślak M, Cornish KAS, Courtin J, Dabrowski S, Dahle H, Djeffane S, Dorawa S, Dusaucy J, Enault F, Fedøy AE, Freitag-Pohl S, Fridjonsson OH, Galiez C, Glomsaker E, Guérin M, Gundesø SE, Gudmundsdóttir EE, Gudmundsson H, Håkansson M, Henke C, Helleux A, Henriksen JR, Hjörleifdóttir S, Hreggvidsson GO, Jasilionis A, Jochheim A, Jónsdóttir I, Jónsdóttir LB, Jurczak-Kurek A, Kaczorowski T, Kalinowski J, Kozlowski LP, Krupovic M, Kwiatkowska-Semrau K, Lanes O, Lange J, Lebrat J, Linares-Pastén J, Liu Y, Lorentsen SA, Lutterman T, Mas T, Merré W, Mirdita M, Morzywołek A, Ndela EO, Karlsson EN, Olgudóttir E, Pedersen C, Perler F, Pétursdóttir SK, Plotka M, Pohl E, Prangishvili D, Ray JL, Reynisson B, Róbertsdóttir T, Sandaa RA, Sczyrba A, Skírnisdóttir S, Söding J, Solstad T, Steen IH, Stefánsson SK, Steinegger M, Overå KS, Striberny B, Svensson A, Szadkowska M, Tarrant EJ, Terzian P, Tourigny M, Bergh TVD, Vanhalst J, Vincent J, Vroling B, Walse B, Wang L, Watzlawick H, Welin M, Werbowy O, Wons E, Zhang R. Going to extremes - a metagenomic journey into the dark matter of life. FEMS Microbiol Lett 2021; 368:6296640. [PMID: 34114607 DOI: 10.1093/femsle/fnab067] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 06/08/2021] [Indexed: 02/06/2023] Open
Abstract
The Virus-X-Viral Metagenomics for Innovation Value-project was a scientific expedition to explore and exploit uncharted territory of genetic diversity in extreme natural environments such as geothermal hot springs and deep-sea ocean ecosystems. Specifically, the project was set to analyse and exploit viral metagenomes with the ultimate goal of developing new gene products with high innovation value for applications in biotechnology, pharmaceutical, medical, and the life science sectors. Viral gene pool analysis is also essential to obtain fundamental insight into ecosystem dynamics and to investigate how viruses influence the evolution of microbes and multicellular organisms. The Virus-X Consortium, established in 2016, included experts from eight European countries. The unique approach based on high throughput bioinformatics technologies combined with structural and functional studies resulted in the development of a biodiscovery pipeline of significant capacity and scale. The activities within the Virus-X consortium cover the entire range from bioprospecting and methods development in bioinformatics to protein production and characterisation, with the final goal of translating our results into new products for the bioeconomy. The significant impact the consortium made in all of these areas was possible due to the successful cooperation between expert teams that worked together to solve a complex scientific problem using state-of-the-art technologies as well as developing novel tools to explore the virosphere, widely considered as the last great frontier of life.
Collapse
Affiliation(s)
| | - Anna-Karina Kaczorowska
- Collection of Plasmids and Microorganisms, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | | | - Josefin Ahlqvist
- Biotechnology, Department of Chemistry, Lund University, PO Box 124, Naturvetarvägen 14/Sölvegatan 39 A, SE-221 00 Lund, Sweden
| | | | - Joseph Altenbuchner
- Institute for Industrial Genetics, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| | - Hasan Arsin
- Department of Biological Sciences, University of Bergen, PO Box 7803, Thormøhlens gate 55, N-5020 Bergen, Norway
| | | | - David Brandt
- Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany
| | - Magdalena Cichowicz-Cieślak
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Katy A S Cornish
- Department of Chemistry, Durham University, South Road, Durham DH1 3LE, United Kingdom
| | | | | | - Håkon Dahle
- Department of Biological Sciences, University of Bergen, PO Box 7803, Thormøhlens gate 55, N-5020 Bergen, Norway.,Department of Informatics, University of Bergen, PO Box 7803, Thormøhlens gate 53 A/B, N-5020 Bergen, Norway
| | | | - Sebastian Dorawa
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | | | - Francois Enault
- Université Clermont Auvergne, CNRS, Laboratoire Microorganismes: Génome et Environnement, 49 Boulevard François-Mitterrand - CS 60032, UMR 6023, Clermont-Ferrand, France
| | - Anita-Elin Fedøy
- Department of Biological Sciences, University of Bergen, PO Box 7803, Thormøhlens gate 55, N-5020 Bergen, Norway
| | - Stefanie Freitag-Pohl
- Department of Chemistry, Durham University, South Road, Durham DH1 3LE, United Kingdom
| | | | - Clovis Galiez
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Eirin Glomsaker
- ArcticZymes Technologies PO Box 6463, Sykehusveien 23, 9294 Tromsø, Norway
| | | | - Sigurd E Gundesø
- ArcticZymes Technologies PO Box 6463, Sykehusveien 23, 9294 Tromsø, Norway
| | | | | | - Maria Håkansson
- SARomics Biostructures, Scheelevägen 2, SE-223 81 Lund, Sweden
| | - Christian Henke
- Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany.,Computational Metagenomics, Bielefeld University, Universitätsstraße 27, 30501 Bielefeld, Germany
| | | | | | | | - Gudmundur O Hreggvidsson
- Matis ohf, Vinlandsleid 12, Reykjavik 113, Iceland.,Faculty of Life and Environmental Sciences, University of Iceland, Askja-Sturlugata 7, Reykjavik, Iceland
| | - Andrius Jasilionis
- Biotechnology, Department of Chemistry, Lund University, PO Box 124, Naturvetarvägen 14/Sölvegatan 39 A, SE-221 00 Lund, Sweden
| | - Annika Jochheim
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | | | | | - Agata Jurczak-Kurek
- Department of Molecular Evolution, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Tadeusz Kaczorowski
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Jörn Kalinowski
- Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany
| | - Lukasz P Kozlowski
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany.,Institute of Informatics, Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, Warsaw 02-097, Poland
| | - Mart Krupovic
- Institute Pasteur, Department of Microbiology, 25-28 Rue du Dr Roux, 75015 Paris, France
| | - Karolina Kwiatkowska-Semrau
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Olav Lanes
- ArcticZymes Technologies PO Box 6463, Sykehusveien 23, 9294 Tromsø, Norway
| | - Joanna Lange
- Bio-Prodict, Nieuwe Marktstraat 54E 6511AA Nijmegen, Netherlands
| | | | - Javier Linares-Pastén
- Biotechnology, Department of Chemistry, Lund University, PO Box 124, Naturvetarvägen 14/Sölvegatan 39 A, SE-221 00 Lund, Sweden
| | - Ying Liu
- Institute Pasteur, Department of Microbiology, 25-28 Rue du Dr Roux, 75015 Paris, France
| | | | - Tobias Lutterman
- Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany
| | - Thibaud Mas
- Université Clermont Auvergne, CNRS, Laboratoire Microorganismes: Génome et Environnement, 49 Boulevard François-Mitterrand - CS 60032, UMR 6023, Clermont-Ferrand, France
| | | | - Milot Mirdita
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Agnieszka Morzywołek
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Eric Olo Ndela
- Université Clermont Auvergne, CNRS, Laboratoire Microorganismes: Génome et Environnement, 49 Boulevard François-Mitterrand - CS 60032, UMR 6023, Clermont-Ferrand, France
| | - Eva Nordberg Karlsson
- Biotechnology, Department of Chemistry, Lund University, PO Box 124, Naturvetarvägen 14/Sölvegatan 39 A, SE-221 00 Lund, Sweden
| | | | - Cathrine Pedersen
- ArcticZymes Technologies PO Box 6463, Sykehusveien 23, 9294 Tromsø, Norway
| | - Francine Perler
- Perls of Wisdom Biotech Consulting, 74 Fuller Street, Brookline, MA 02446, USA
| | | | - Magdalena Plotka
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Ehmke Pohl
- Department of Chemistry, Durham University, South Road, Durham DH1 3LE, United Kingdom.,Department of Biosciences, Durham University, South Road, Durham DH1 3LE, UK
| | - David Prangishvili
- Institute Pasteur, Department of Microbiology, 25-28 Rue du Dr Roux, 75015 Paris, France
| | - Jessica L Ray
- Department of Biological Sciences, University of Bergen, PO Box 7803, Thormøhlens gate 55, N-5020 Bergen, Norway.,NORCE Environment, NORCE Norwegian Research Centre AS, Nygårdsgaten 112, 5008 Bergen, Norway
| | | | | | - Ruth-Anne Sandaa
- Department of Biological Sciences, University of Bergen, PO Box 7803, Thormøhlens gate 55, N-5020 Bergen, Norway
| | - Alexander Sczyrba
- Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany.,Computational Metagenomics, Bielefeld University, Universitätsstraße 27, 30501 Bielefeld, Germany
| | | | - Johannes Söding
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Terese Solstad
- ArcticZymes Technologies PO Box 6463, Sykehusveien 23, 9294 Tromsø, Norway
| | - Ida H Steen
- Department of Biological Sciences, University of Bergen, PO Box 7803, Thormøhlens gate 55, N-5020 Bergen, Norway
| | | | - Martin Steinegger
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | | | - Bernd Striberny
- ArcticZymes Technologies PO Box 6463, Sykehusveien 23, 9294 Tromsø, Norway
| | - Anders Svensson
- SARomics Biostructures, Scheelevägen 2, SE-223 81 Lund, Sweden
| | - Monika Szadkowska
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Emma J Tarrant
- Department of Chemistry, Durham University, South Road, Durham DH1 3LE, United Kingdom
| | - Paul Terzian
- Université Clermont Auvergne, CNRS, Laboratoire Microorganismes: Génome et Environnement, 49 Boulevard François-Mitterrand - CS 60032, UMR 6023, Clermont-Ferrand, France
| | | | | | | | - Jonathan Vincent
- Université Clermont Auvergne, CNRS, Laboratoire Microorganismes: Génome et Environnement, 49 Boulevard François-Mitterrand - CS 60032, UMR 6023, Clermont-Ferrand, France
| | - Bas Vroling
- Bio-Prodict, Nieuwe Marktstraat 54E 6511AA Nijmegen, Netherlands
| | - Björn Walse
- SARomics Biostructures, Scheelevägen 2, SE-223 81 Lund, Sweden
| | - Lei Wang
- Institute for Industrial Genetics, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| | - Hildegard Watzlawick
- Institute for Industrial Genetics, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| | - Martin Welin
- SARomics Biostructures, Scheelevägen 2, SE-223 81 Lund, Sweden
| | - Olesia Werbowy
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Ewa Wons
- Laboratory of Extremophiles Biology, Department of Microbiology, Faculty of Biology, University of Gdansk, Wita Stwosza 59, Gdansk 80-308, Poland
| | - Ruoshi Zhang
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| |
Collapse
|
12
|
Lima I, Cino EA. Sequence similarity in 3D for comparison of protein families. J Mol Graph Model 2021; 106:107906. [PMID: 33848948 DOI: 10.1016/j.jmgm.2021.107906] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 11/26/2022]
Abstract
Homologous proteins are often compared by pairwise sequence alignment, and structure superposition if the atomic coordinates are available. Unification of sequence and structure data is an important task in structural biology. Here, we present the Sequence Similarity 3D (SS3D) method of integrating sequence and structure information. SS3D is a distance and substitution matrix-based method for straightforward visualization of regions of similarity and difference between homologous proteins. This work details the SS3D approach, and demonstrates its utility through case studies comparing members of several protein families. The examples show that SS3D can effectively highlight biologically important regions of similarity and dissimilarity. We anticipate that the method will be useful for numerous structural biology applications, including, but not limited to, studies of binding specificity, structure-function relationships, and evolutionary pathways. SS3D is available with a manual and tutorial at https://github.com/0x462e41/SS3D/.
Collapse
Affiliation(s)
- Igor Lima
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Elio A Cino
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| |
Collapse
|
13
|
Structural Insights into Carboxylic Polyester-Degrading Enzymes and Their Functional Depolymerizing Neighbors. Int J Mol Sci 2021; 22:ijms22052332. [PMID: 33652738 PMCID: PMC7956259 DOI: 10.3390/ijms22052332] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 11/28/2022] Open
Abstract
Esters are organic compounds widely represented in cellular structures and metabolism, originated by the condensation of organic acids and alcohols. Esterification reactions are also used by chemical industries for the production of synthetic plastic polymers. Polyester plastics are an increasing source of environmental pollution due to their intrinsic stability and limited recycling efforts. Bioremediation of polyesters based on the use of specific microbial enzymes is an interesting alternative to the current methods for the valorization of used plastics. Microbial esterases are promising catalysts for the biodegradation of polyesters that can be engineered to improve their biochemical properties. In this work, we analyzed the structure-activity relationships in microbial esterases, with special focus on the recently described plastic-degrading enzymes isolated from marine microorganisms and their structural homologs. Our analysis, based on structure-alignment, molecular docking, coevolution of amino acids and surface electrostatics determined the specific characteristics of some polyester hydrolases that could be related with their efficiency in the degradation of aromatic polyesters, such as phthalates.
Collapse
|
14
|
Timonina D, Sharapova Y, Švedas V, Suplatov D. Bioinformatic analysis of subfamily-specific regions in 3D-structures of homologs to study functional diversity and conformational plasticity in protein superfamilies. Comput Struct Biotechnol J 2021; 19:1302-1311. [PMID: 33738079 PMCID: PMC7933735 DOI: 10.1016/j.csbj.2021.02.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 02/08/2021] [Accepted: 02/09/2021] [Indexed: 02/07/2023] Open
Abstract
Local 3D-structural differences in homologous proteins contribute to functional diversity observed in a superfamily, but so far received little attention as bioinformatic analysis was usually carried out at the level of amino acid sequences. We have developed Zebra3D - the first-of-its-kind bioinformatic software for systematic analysis of 3D-alignments of protein families using machine learning. The new tool identifies subfamily-specific regions (SSRs) - patterns of local 3D-structure (i.e. single residues, loops, or secondary structure fragments) that are spatially equivalent within families/subfamilies, but are different among them, and thus can be associated with functional diversity and function-related conformational plasticity. Bioinformatic analysis of protein superfamilies by Zebra3D can be used to study 3D-determinants of catalytic activity and specific accommodation of ligands, help to prepare focused libraries for directed evolution or assist development of chimeric enzymes with novel properties by exchange of equivalent regions between homologs, and to characterize plasticity in binding sites. A companion Mustguseal web-server is available to automatically construct a 3D-alignment of functionally diverse proteins, thus reducing the minimal input required to operate Zebra3D to a single PDB code. The Zebra3D + Mustguseal combined approach provides the opportunity to systematically explore the value of SSRs in superfamilies and to use this information for protein design and drug discovery. The software is available open-access at https://biokinet.belozersky.msu.ru/Zebra3D.
Collapse
Affiliation(s)
- Daria Timonina
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Yana Sharapova
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
| | - Vytas Švedas
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
| | - Dmitry Suplatov
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
- Corresponding author.
| |
Collapse
|
15
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
16
|
Huffer KE, Aleksandrova AA, Jara-Oseguera A, Forrest LR, Swartz KJ. Global alignment and assessment of TRP channel transmembrane domain structures to explore functional mechanisms. eLife 2020; 9:e58660. [PMID: 32804077 PMCID: PMC7431192 DOI: 10.7554/elife.58660] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 07/31/2020] [Indexed: 12/20/2022] Open
Abstract
The recent proliferation of published TRP channel structures provides a foundation for understanding the diverse functional properties of this important family of ion channel proteins. To facilitate mechanistic investigations, we constructed a structure-based alignment of the transmembrane domains of 120 TRP channel structures. Comparison of structures determined in the absence or presence of activating stimuli reveals similar constrictions in the central ion permeation pathway near the intracellular end of the S6 helices, pointing to a conserved cytoplasmic gate and suggesting that most available structures represent non-conducting states. Comparison of the ion selectivity filters toward the extracellular end of the pore supports existing hypotheses for mechanisms of ion selectivity. Also conserved to varying extents are hot spots for interactions with hydrophobic ligands, lipids and ions, as well as discrete alterations in helix conformations. This analysis therefore provides a framework for investigating the structural basis of TRP channel gating mechanisms and pharmacology, and, despite the large number of structures included, reveals the need for additional structural data and for more functional studies to establish the mechanistic basis of TRP channel function.
Collapse
Affiliation(s)
- Katherine E Huffer
- Molecular Physiology and Biophysics Section, Porter Neuroscience Research Center, National Institute of Neurological Diseases and Stroke, National Institutes of HealthBethesdaUnited States
| | - Antoniya A Aleksandrova
- Computational Structural Biology Section, Porter Neuroscience Research Center, National Institute of Neurological Diseases and Stroke, National Institutes of HealthBethesdaUnited States
| | - Andrés Jara-Oseguera
- Molecular Physiology and Biophysics Section, Porter Neuroscience Research Center, National Institute of Neurological Diseases and Stroke, National Institutes of HealthBethesdaUnited States
| | - Lucy R Forrest
- Computational Structural Biology Section, Porter Neuroscience Research Center, National Institute of Neurological Diseases and Stroke, National Institutes of HealthBethesdaUnited States
| | - Kenton J Swartz
- Molecular Physiology and Biophysics Section, Porter Neuroscience Research Center, National Institute of Neurological Diseases and Stroke, National Institutes of HealthBethesdaUnited States
| |
Collapse
|
17
|
Akdel M, Durairaj J, de Ridder D, van Dijk ADJ. Caretta - A multiple protein structure alignment and feature extraction suite. Comput Struct Biotechnol J 2020; 18:981-992. [PMID: 32368333 PMCID: PMC7186369 DOI: 10.1016/j.csbj.2020.03.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 02/01/2020] [Accepted: 03/13/2020] [Indexed: 02/06/2023] Open
Abstract
The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta’s performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.
Collapse
Affiliation(s)
- Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Janani Durairaj
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands.,Mathematical and Statistical Methods - Biometris, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| |
Collapse
|
18
|
Velez G, Sun YJ, Khan S, Yang J, Herrmann J, Chemudupati T, MacLaren RE, Gakhar L, Wakatsuki S, Bassuk AG, Mahajan VB. Structural Insights into the Unique Activation Mechanisms of a Non-classical Calpain and Its Disease-Causing Variants. Cell Rep 2020; 30:881-892.e5. [PMID: 31968260 PMCID: PMC7001764 DOI: 10.1016/j.celrep.2019.12.077] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 11/26/2019] [Accepted: 12/19/2019] [Indexed: 12/12/2022] Open
Abstract
Increased calpain activity is linked to neuroinflammation including a heritable retinal disease caused by hyper-activating mutations in the calcium-activated calpain-5 (CAPN5) protease. Although structures for classical calpains are known, the structure of CAPN5, a non-classical calpain, remains undetermined. Here we report the 2.8 Å crystal structure of the human CAPN5 protease core (CAPN5-PC). Compared to classical calpains, CAPN5-PC requires high calcium concentrations for maximal activity. Structure-based phylogenetic analysis and multiple sequence alignment reveal that CAPN5-PC contains three elongated flexible loops compared to its classical counterparts. The presence of a disease-causing mutation (c.799G>A, p.Gly267Ser) on the unique PC2L2 loop reveals a function in this region for regulating enzymatic activity. This mechanism could be transferred to distant calpains, using synthetic calpain hybrids, suggesting an evolutionary mechanism for fine-tuning calpain function by modifying flexible loops. Further, the open (inactive) conformation of CAPN5-PC provides structural insight into CAPN5-specific residues that can guide inhibitor design.
Collapse
Affiliation(s)
- Gabriel Velez
- Omics Laboratory, Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA 94304, USA; Medical Scientist Training Program, University of Iowa, Iowa City, IA 52242, USA
| | - Young Joo Sun
- Omics Laboratory, Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA 94304, USA
| | - Saif Khan
- Protein and Crystallography Facility, University of Iowa, Iowa City, IA 52242, USA; Department of Biochemistry, University of Iowa, Iowa City, IA 52242, USA; Department of Biology and Biochemistry, University of Bath, Bath BA2 7AX, UK
| | - Jing Yang
- Omics Laboratory, Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA 94304, USA
| | - Jonathan Herrmann
- Department of Structural Biology, Stanford University, Palo Alto, CA 94305, USA; Photon Science, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA
| | - Teja Chemudupati
- Omics Laboratory, Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA 94304, USA
| | - Robert E MacLaren
- NIHR Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford EC1V 2PD, UK; Oxford Eye Hospital, University of Oxford NHS Trust, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Lokesh Gakhar
- Protein and Crystallography Facility, University of Iowa, Iowa City, IA 52242, USA; Department of Biochemistry, University of Iowa, Iowa City, IA 52242, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Soichi Wakatsuki
- Department of Structural Biology, Stanford University, Palo Alto, CA 94305, USA; Photon Science, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA
| | | | - Vinit B Mahajan
- Omics Laboratory, Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA 94304, USA; Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, USA.
| |
Collapse
|