1
|
Buchholz PCF, van Loo B, Eenink BDG, Bornberg-Bauer E, Pleiss J. Ancestral sequences of a large promiscuous enzyme family correspond to bridges in sequence space in a network representation. J R Soc Interface 2021; 18:20210389. [PMID: 34727710 DOI: 10.1098/rsif.2021.0389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Evolutionary relationships of protein families can be characterized either by networks or by trees. Whereas trees allow for hierarchical grouping and reconstruction of the most likely ancestral sequences, networks lack a time axis but allow for thresholds of pairwise sequence identity to be chosen and, therefore, the clustering of family members with presumably more similar functions. Here, we use the large family of arylsulfatases and phosphonate monoester hydrolases to investigate similarities, strengths and weaknesses in tree and network representations. For varying thresholds of pairwise sequence identity, values of betweenness centrality and clustering coefficients were derived for nodes of the reconstructed ancestors to measure the propensity to act as a bridge in a network. Based on these properties, ancestral protein sequences emerge as bridges in protein sequence networks. Interestingly, many ancestral protein sequences appear close to extant sequences. Therefore, reconstructed ancestor sequences might also be interpreted as yet-to-be-identified homologues. The concept of ancestor reconstruction is compared to consensus sequences, too. It was found that hub sequences in a network, e.g. reconstructed ancestral sequences that are connected to many neighbouring sequences, share closer similarity with derived consensus sequences. Therefore, some reconstructed ancestor sequences can also be interpreted as consensus sequences.
Collapse
Affiliation(s)
- Patrick C F Buchholz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, Stuttgart 70569, Germany
| | - Bert van Loo
- Department of Applied Sciences, Northumbria University, Newcastle-upon-Tyne NE1 8ST, UK.,Institute for Evolution and Biodiversity, University of Münster, Hüfferstraße 1, Münster 48149, Germany
| | - Bernard D G Eenink
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstraße 1, Münster 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstraße 1, Münster 48149, Germany.,Department of Protein Evolution, Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, Tübingen 72076, Germany
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, Stuttgart 70569, Germany
| |
Collapse
|
2
|
Orlando M, Buchholz PCF, Lotti M, Pleiss J. The GH19 Engineering Database: Sequence diversity, substrate scope, and evolution in glycoside hydrolase family 19. PLoS One 2021; 16:e0256817. [PMID: 34699529 PMCID: PMC8547705 DOI: 10.1371/journal.pone.0256817] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 08/16/2021] [Indexed: 01/21/2023] Open
Abstract
The glycoside hydrolase 19 (GH19) is a bifunctional family of chitinases and endolysins, which have been studied for the control of plant fungal pests, the recycle of chitin biomass, and the treatment of multi-drug resistant bacteria. The GH19 domain-containing sequences (22,461) were divided into a chitinase and an endolysin subfamily by analyzing sequence networks, guided by taxonomy and the substrate specificity of characterized enzymes. The chitinase subfamily was split into seventeen groups, thus extending the previous classification. The endolysin subfamily is more diverse and consists of thirty-four groups. Despite their sequence diversity, twenty-six residues are conserved in chitinases and endolysins, which can be distinguished by two specific sequence patterns at six and four positions, respectively. Their location outside the catalytic cleft suggests a possible mechanism for substrate specificity that goes beyond the direct interaction with the substrate. The evolution of the GH19 catalytic domain was investigated by large-scale phylogeny. The inferred evolutionary history and putative horizontal gene transfer events differ from previous works. While no clear patterns were detected in endolysins, chitinases varied in sequence length by up to four loop insertions, causing at least eight distinct presence/absence loop combinations. The annotated GH19 sequences and structures are accessible via the GH19 Engineering Database (GH19ED, https://gh19ed.biocatnet.de). The GH19ED has been developed to support the prediction of substrate specificity and the search for novel GH19 enzymes from neglected taxonomic groups or in regions of the sequence space where few sequences have been described yet.
Collapse
Affiliation(s)
- Marco Orlando
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Patrick C. F. Buchholz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany
| | - Marina Lotti
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany
- * E-mail:
| |
Collapse
|
3
|
Kamrava S, Tahmasebi P, Sahimi M, Arbabi S. Phase transitions, percolation, fracture of materials, and deep learning. Phys Rev E 2020; 102:011001. [PMID: 32794896 DOI: 10.1103/physreve.102.011001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 06/24/2020] [Indexed: 11/07/2022]
Abstract
Percolation and fracture propagation in disordered solids represent two important problems in science and engineering that are characterized by phase transitions: loss of macroscopic connectivity at the percolation threshold p_{c} and formation of a macroscopic fracture network at the incipient fracture point (IFP). Percolation also represents the fracture problem in the limit of very strong disorder. An important unsolved problem is accurate prediction of physical properties of systems undergoing such transitions, given limited data far from the transition point. There is currently no theoretical method that can use limited data for a region far from a transition point p_{c} or the IFP and predict the physical properties all the way to that point, including their location. We present a deep neural network (DNN) for predicting such properties of two- and three-dimensional systems and in particular their percolation probability, the threshold p_{c}, the elastic moduli, and the universal Poisson ratio at p_{c}. All the predictions are in excellent agreement with the data. In particular, the DNN predicts correctly p_{c}, even though the training data were for the state of the systems far from p_{c}. This opens up the possibility of using the DNN for predicting physical properties of many types of disordered materials that undergo phase transformation, for which limited data are available for only far from the transition point.
Collapse
Affiliation(s)
- Serveh Kamrava
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, California 90089-1211, USA
| | - Pejman Tahmasebi
- Department of Petroleum Engineering, University of Wyoming, Laramie, Wyoming 82071, USA
| | - Muhammad Sahimi
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, California 90089-1211, USA
| | - Sepehr Arbabi
- Department of Chemical Engineering, University of Texas of the Permian Basin, Odessa, Texas 79762, USA
| |
Collapse
|
4
|
Bauer TL, Buchholz PCF, Pleiss J. The modular structure of α/β-hydrolases. FEBS J 2019; 287:1035-1053. [PMID: 31545554 DOI: 10.1111/febs.15071] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/15/2019] [Accepted: 09/19/2019] [Indexed: 12/22/2022]
Abstract
The α/β-hydrolase fold family is highly diverse in sequence, structure and biochemical function. To investigate the sequence-structure-function relationships, the Lipase Engineering Database (https://led.biocatnet.de) was updated. Overall, 280 638 protein sequences and 1557 protein structures were analysed. All α/β-hydrolases consist of the catalytically active core domain, but they might also contain additional structural modules, resulting in 12 different architectures: core domain only, additional lids at three different positions, three different caps, additional N- or C-terminal domains and combinations of N- and C-terminal domains with caps and lids respectively. In addition, the α/β-hydrolases were distinguished by their oxyanion hole signature (GX-, GGGX- and Y-types). The N-terminal domains show two different folds, the Rossmann fold or the β-propeller fold. The C-terminal domains show a β-sandwich fold. The N-terminal β-propeller domain and the C-terminal β-sandwich domain are structurally similar to carbohydrate-binding proteins such as lectins. The classification was applied to the newly discovered polyethylene terephthalate (PET)-degrading PETases and MHETases, which are core domain α/β-hydrolases of the GX- and the GGGX-type respectively. To investigate evolutionary relationships, sequence networks were analysed. The degree distribution followed a power law with a scaling exponent γ = 1.4, indicating a highly inhomogeneous network which consists of a few hubs and a large number of less connected sequences. The hub sequences have many functional neighbours and therefore are expected to be robust toward possible deleterious effects of mutations. The cluster size distribution followed a power law with an extrapolated scaling exponent τ = 2.6, which strongly supports the connectedness of the sequence space of α/β-hydrolases. DATABASE: Supporting data about domains from other proteins with structural similarity to the N- or C-terminal domains of α/β-hydrolases are available in Data Repository of the University of Stuttgart (DaRUS) under doi: https://doi.org/10.18419/darus-458.
Collapse
Affiliation(s)
- Tabea L Bauer
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| | - Patrick C F Buchholz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| |
Collapse
|
5
|
Gräff M, Buchholz PC, Stockinger P, Bommarius B, Bommarius AS, Pleiss J. The Short‐chain Dehydrogenase/Reductase Engineering Database (SDRED): A classification and analysis system for a highly diverse enzyme family. Proteins 2019; 87:443-451. [DOI: 10.1002/prot.25666] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 01/26/2019] [Accepted: 01/31/2019] [Indexed: 12/17/2022]
Affiliation(s)
- Maike Gräff
- Institute of Biochemistry and Technical BiochemistryUniversity of Stuttgart Stuttgart Germany
| | - Patrick C.F. Buchholz
- Institute of Biochemistry and Technical BiochemistryUniversity of Stuttgart Stuttgart Germany
| | - Peter Stockinger
- Institute of Biochemistry and Technical BiochemistryUniversity of Stuttgart Stuttgart Germany
| | - Bettina Bommarius
- Department of Chemical and Biomolecular EngineeringGeorgia Institute of Technology Atlanta Georgia
| | - Andreas S. Bommarius
- Department of Chemical and Biomolecular EngineeringGeorgia Institute of Technology Atlanta Georgia
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical BiochemistryUniversity of Stuttgart Stuttgart Germany
| |
Collapse
|
6
|
Abstract
The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension Df was distance-dependent: a high dimension for single and double mutants (Df = 4.0), which dropped to Df = 0.7-1.0 at 90% sequence identity, and increased to Df = 3.5-4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology.
Collapse
|
7
|
Buß O, Buchholz PCF, Gräff M, Klausmann P, Rudat J, Pleiss J. The ω-transaminase engineering database (oTAED): A navigation tool in protein sequence and structure space. Proteins 2018; 86:566-580. [PMID: 29423963 DOI: 10.1002/prot.25477] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2017] [Revised: 02/03/2018] [Accepted: 02/06/2018] [Indexed: 01/02/2023]
Abstract
The ω-Transaminase Engineering Database (oTAED) was established as a publicly accessible resource on sequences and structures of the biotechnologically relevant ω-transaminases (ω-TAs) from Fold types I and IV. The oTAED integrates sequence and structure data, provides a classification based on fold type and sequence similarity, and applies a standard numbering scheme to identify equivalent positions in homologous proteins. The oTAED includes 67 210 proteins (114 655 sequences) which are divided into 169 homologous families based on global sequence similarity. The 44 and 39 highly conserved positions which were identified in Fold type I and IV, respectively, include the known catalytic residues and a large fraction of glycines and prolines in loop regions, which might have a role in protein folding and stability. However, for most of the conserved positions the function is still unknown. Literature information on positions that mediate substrate specificity and stereoselectivity was systematically examined. The standard numbering schemes revealed that many positions which have been described in different enzymes are structurally equivalent. For some positions, multiple functional roles have been suggested based on experimental data in different enzymes. The proposed standard numbering schemes for Fold type I and IV ω-TAs assist with analysis of literature data, facilitate annotation of ω-TAs, support prediction of promising mutation sites, and enable navigation in ω-TA sequence space. Thus, it is a useful tool for enzyme engineering and the selection of novel ω-TA candidates with desired biochemical properties.
Collapse
Affiliation(s)
- Oliver Buß
- Institute of Process Engineering in Life Sciences, Karlsruhe Institute of Technology, Engler-Bunte-Ring 3, Karlsruhe, 76131, Germany
| | - Patrick C F Buchholz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, Stuttgart, 70569, Germany
| | - Maike Gräff
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, Stuttgart, 70569, Germany
| | - Peter Klausmann
- Institute of Process Engineering in Life Sciences, Karlsruhe Institute of Technology, Engler-Bunte-Ring 3, Karlsruhe, 76131, Germany
| | - Jens Rudat
- Institute of Process Engineering in Life Sciences, Karlsruhe Institute of Technology, Engler-Bunte-Ring 3, Karlsruhe, 76131, Germany
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, Stuttgart, 70569, Germany
| |
Collapse
|