Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: de Melo-Minardi RC, Bastard K, Artiguenave F. Identification of subfamily-specific sites based on active sites modeling and clustering. ACTA ACUST UNITED AC 2010;26:3075-82. [PMID: 20980272 DOI: 10.1093/bioinformatics/btq595] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

For:	de Melo-Minardi RC, Bastard K, Artiguenave F. Identification of subfamily-specific sites based on active sites modeling and clustering. ACTA ACUST UNITED AC 2010;26:3075-82. [PMID: 20980272 DOI: 10.1093/bioinformatics/btq595] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Number

Cited by Other Article(s)

Elisée E, Ducrot L, Méheust R, Bastard K, Fossey-Jouenne A, Grogan G, Pelletier E, Petit JL, Stam M, de Berardinis V, Zaparucha A, Vallenet D, Vergne-Vaxelaire C. A refined picture of the native amine dehydrogenase family revealed by extensive biodiversity screening. Nat Commun 2024;15:4933. [PMID: 38858403 PMCID: PMC11164908 DOI: 10.1038/s41467-024-49009-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 05/20/2024] [Indexed: 06/12/2024] Open

Affiliation(s)

Eddy Elisée Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
Laurine Ducrot Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
Raphaël Méheust Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
Karine Bastard School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, 2006, Australia
Aurélie Fossey-Jouenne Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
Gideon Grogan York Structural Biology Laboratory, Department of Chemistry, University of York, Heslington, York, YO10 5DD, UK
Eric Pelletier Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
Jean-Louis Petit Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
Mark Stam Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
Véronique de Berardinis Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
Anne Zaparucha Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
David Vallenet Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
Carine Vergne-Vaxelaire Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.

Collapse

Paul M, Banerjee A, Maiti S, Mitra D, DasMohapatra PK, Thatoi H. Evaluation of substrate specificity and catalytic promiscuity of Bacillus albus cellulase: an insight into in silico proteomic study aiming at enhanced production of renewable energy. J Biomol Struct Dyn 2023:1-23. [PMID: 38126200 DOI: 10.1080/07391102.2023.2295971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]

Abstract

Cellulases are enzymes that aid in the hydrolysis of cellulosic fibers and have a wide range of industrial uses. In the present in silico study, sequence alignment between cellulases from different Bacillus species revealed that most of the residues are conserved in those aligned enzymes. Three dimensional structures of cellulase enzymes from 23 different Bacillus species have been predicted and based on the alignment between the modeled structures, those enzymes have been categorized into 7 different groups according to the homology in their conformational folds. There are two structural contents in Gr-I cellulase namely β1-α2 and β3-α5 loops which varies greatly according to their static position. Molecular docking study between the B. albus cellulase and its various cellulosic substrates including xylanoglucan oligosaccharides revealed that residues viz. Phe154, Tyr258, Tyr282, Tyr285, and Tyr376 of B. albus cellulase are significantly involved in formation stacking interaction during enzyme-substrate binding. Residue interaction network and binding energy analysis for the B. albus cellulase with different cellulosic substrates depicted the strong affinity of XylGlc3 substrate with the receptor enzyme. Molecular interaction and molecular dynamics simulation studies exhibited structural stability of enzyme-substrate complexes which are greatly influenced by the presence of catalytic promiscuity in their substrate binding sites. Screening of B. albus in carboxymethylcellulose (CMC) and xylan supplemented agar media revealed the capability of the bacterium in degrading both cellulose and xylan. Overall, the study demonstrated B. albus cellulase as an effective biocatalyst candidate with the potential role of catalytic promiscuity for possible applications in biofuel industries.Communicated by Ramaswamy H. Sarma.

Collapse

Singh R, Sledzieski S, Bryson B, Cowen L, Berger B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci U S A 2023;120:e2220778120. [PMID: 37289807 PMCID: PMC10268324 DOI: 10.1073/pnas.2220778120] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/10/2023] [Indexed: 06/10/2023] Open

Sherill-Rofe D, Raban O, Findlay S, Rahat D, Unterman I, Samiei A, Yasmeen A, Kaiser Z, Kuasne H, Park M, Foulkes WD, Bloch I, Zick A, Gotlieb WH, Tabach Y, Orthwein A. Multi-omics data integration analysis identifies the spliceosome as a key regulator of DNA double-strand break repair. NAR Cancer 2022;4:zcac013. [PMID: 35399185 PMCID: PMC8991968 DOI: 10.1093/narcan/zcac013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 02/25/2022] [Accepted: 03/23/2022] [Indexed: 11/14/2022] Open

Affiliation(s)

Dana Sherill-Rofe Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
Oded Raban Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
Steven Findlay Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
Dolev Rahat Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
Irene Unterman Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
Arash Samiei Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
Amber Yasmeen Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
Zafir Kaiser Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
Hellen Kuasne Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
Morag Park Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
William D Foulkes The Research Institute of the McGill University Health Centre, Montreal, QC H4A 3J1, Canada
Idit Bloch Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
Aviad Zick Department of Oncology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Ein-Kerem, Jerusalem 91120, Israel
Walter H Gotlieb Division of Gynecology Oncology, Segal Cancer Center, Jewish General Hospital, McGill University, Montreal, QC H3T 1E2, Canada
Yuval Tabach Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
Alexandre Orthwein Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada

Collapse

Machine learning modeling of family wide enzyme-substrate specificity screens. PLoS Comput Biol 2022;18:e1009853. [PMID: 35143485 PMCID: PMC8865696 DOI: 10.1371/journal.pcbi.1009853] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 02/23/2022] [Accepted: 01/21/2022] [Indexed: 11/19/2022] Open

Abstract

Biocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and in silico directed evolution are well-posed for this predictive modeling challenge, efforts to date have primarily aimed to increase activity against a single known substrate, rather than to identify enzymes capable of acting on new substrates of interest. To address this need, we curate 6 different high-quality enzyme family screens from the literature that each measure multiple enzymes against multiple substrates. We compare machine learning-based compound-protein interaction (CPI) modeling approaches from the literature used for predicting drug-target interactions. Surprisingly, comparing these interaction-based models against collections of independent (single task) enzyme-only or substrate-only models reveals that current CPI approaches are incapable of learning interactions between compounds and proteins in the current family level data regime. We further validate this observation by demonstrating that our no-interaction baseline can outperform CPI-based models from the literature used to guide the discovery of kinase inhibitors. Given the high performance of non-interaction based models, we introduce a new structure-based strategy for pooling residue representations across a protein sequence. Altogether, this work motivates a principled path forward in order to build and evaluate meaningful predictive models for biocatalysis and other drug discovery applications.

Predicting interactions between compounds and proteins represents a long-standing dream of drug discovery and protein engineering. Robust models of enzyme-substrate scope would dramatically advance our ability to design synthetic routes involving enzymatic catalysis. However, the lack of standardization between compound-protein interaction studies makes it difficult to evaluate the generalizability of such models. In this work we take a critical step forward by standardizing high-quality datasets measuring enzyme-substrate interactions, outlining rigorous evaluations, and proposing a new way to integrate structural information into protein representations. In testing previous modeling approaches, we highlight a surprising inability of existing models to effectively leverage compound-protein interactions to improve generalization, which challenges a perception in the literature. This establishes future opportunities for model development and integration of enzyme-substrate scope models into computer-aided synthesis planning software.

Collapse

Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022;130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Rosen MR, Leuthaeuser JB, Parish CA, Fetrow JS. Isofunctional Clustering and Conformational Analysis of the Arsenate Reductase Superfamily Reveals Nine Distinct Clusters. Biochemistry 2020;59:4262-4284. [PMID: 33135415 DOI: 10.1021/acs.biochem.0c00651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Abstract

Arsenate reductase (ArsC) is a superfamily of enzymes that reduce arsenate. Due to active site similarities, some ArsC can function as low-molecular weight protein tyrosine phosphatases (LMW-PTPs). Broad superfamily classifications align with redox partners (Trx- or Grx-linked). To understand this superfamily's mechanistic diversity, the ArsC superfamily is classified on the basis of active site features utilizing the tools TuLIP (two-level iterative clustering process) and autoMISST (automated multilevel iterative sequence searching technique). This approach identified nine functionally relevant (perhaps isofunctional) protein groups. Five groups exhibit distinct ArsC mechanisms. Three are Grx-linked: group 4AA (classical ArsC), group 3AAA (YffB-like), and group 5BAA. Two are Trx-linked: groups 6AAAAA and 7AAAAAAAA. One is an Spx-like transcriptional regulatory group, group 5AAA. Three are potential LMW-PTP groups: groups 7BAAAA, and 7AAAABAA, which have not been previously identified, and the well-studied LMW-PTP family group 8AAA. Molecular dynamics simulations were utilized to explore functional site details. In several families, we confirm and add detail to literature-based mechanistic information. Mechanistic roles are hypothesized for conserved active site residues in several families. In three families, simulations of the unliganded structure sample specific conformational ensembles, which are proposed to represent either a more ligand-binding-competent conformation or a pathway toward a more binding-competent state; these active sites may be designed to traverse high-energy barriers to the lower-energy conformations necessary to more readily bind ligands. This more detailed biochemical understanding of ArsC and ArsC-like PTP mechanisms opens possibilities for further understanding of arsenate bioremediation and the LMW-PTP mechanism.

Collapse

Domain-mediated interactions for protein subfamily identification. Sci Rep 2020;10:264. [PMID: 31937869 PMCID: PMC6959277 DOI: 10.1038/s41598-019-57187-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 12/23/2019] [Indexed: 11/24/2022] Open

Weissenbach J. Exploring biochemical diversity in bacteria. AN ACAD BRAS CIENC 2019;91:e20190252. [PMID: 31365611 DOI: 10.1590/0001-3765201920190252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 04/18/2019] [Indexed: 11/21/2022] Open

A family of native amine dehydrogenases for the asymmetric reductive amination of ketones. Nat Catal 2019. [DOI: 10.1038/s41929-019-0249-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Bastard K, Isabet T, Stura EA, Legrand P, Zaparucha A. Structural Studies based on two Lysine Dioxygenases with Distinct Regioselectivity Brings Insights Into Enzyme Specificity within the Clavaminate Synthase-Like Family. Sci Rep 2018;8:16587. [PMID: 30410048 PMCID: PMC6224419 DOI: 10.1038/s41598-018-34795-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 10/23/2018] [Indexed: 12/19/2022] Open

Fetrow JS, Babbitt PC. New computational approaches to understanding molecular protein function. PLoS Comput Biol 2018;14:e1005756. [PMID: 29621256 PMCID: PMC5886384 DOI: 10.1371/journal.pcbi.1005756] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017;31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Bastard K, Perret A, Mariage A, Bessonnet T, Pinet-Turpault A, Petit JL, Darii E, Bazire P, Vergne-Vaxelaire C, Brewee C, Debard A, Pellouin V, Besnard-Gonnet M, Artiguenave F, Médigue C, Vallenet D, Danchin A, Zaparucha A, Weissenbach J, Salanoubat M, de Berardinis V. Parallel evolution of non-homologous isofunctional enzymes in methionine biosynthesis. Nat Chem Biol 2017;13:858-866. [PMID: 28581482 DOI: 10.1038/nchembio.2397] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 03/22/2017] [Indexed: 12/30/2022]

Affiliation(s)

Karine Bastard CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Alain Perret CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Aline Mariage CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Thomas Bessonnet CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Agnès Pinet-Turpault CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Jean-Louis Petit CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Ekaterina Darii CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Pascal Bazire CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Carine Vergne-Vaxelaire CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Clémence Brewee CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Adrien Debard CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Virginie Pellouin CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Marielle Besnard-Gonnet CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
François Artiguenave CEA, DRF, Centre National de Génotypage, Evry, France
Claudine Médigue CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
David Vallenet CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Antoine Danchin Institute of Cardiometabolism and Nutrition (ICAN), Hôpital de la Pitié-Salpêtrière, Paris, France
Anne Zaparucha CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Jean Weissenbach CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Marcel Salanoubat CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
Véronique de Berardinis CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France

Collapse

Knutson ST, Westwood BM, Leuthaeuser JB, Turner BE, Nguyendac D, Shea G, Kumar K, Hayden JD, Harper AF, Brown SD, Morris JH, Ferrin TE, Babbitt PC, Fetrow JS. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences. Protein Sci 2017;26:677-699. [PMID: 28054422 PMCID: PMC5368075 DOI: 10.1002/pro.3112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Accepted: 12/22/2016] [Indexed: 01/11/2023]

Abstract

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.

Collapse

Harper AF, Leuthaeuser JB, Babbitt PC, Morris JH, Ferrin TE, Poole LB, Fetrow JS. An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins. PLoS Comput Biol 2017;13:e1005284. [PMID: 28187133 PMCID: PMC5302317 DOI: 10.1371/journal.pcbi.1005284] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 12/06/2016] [Indexed: 12/15/2022] Open

Abstract

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.

Collapse

Moll M, Finn PW, Kavraki LE. Structure-guided selection of specificity determining positions in the human Kinome. BMC Genomics 2016;17 Suppl 4:431. [PMID: 27556159 PMCID: PMC5001202 DOI: 10.1186/s12864-016-2790-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Boari de Lima E, Meira W, de Melo-Minardi RC. Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering. PLoS Comput Biol 2016;12:e1005001. [PMID: 27348631 PMCID: PMC4922564 DOI: 10.1371/journal.pcbi.1005001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/22/2016] [Indexed: 01/14/2023] Open

Abstract

As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem's complexity. Hence, this work's purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity.

Collapse

Ligand-binding specificity and promiscuity of the main lignocellulolytic enzyme families as revealed by active-site architecture analysis. Sci Rep 2016;6:23605. [PMID: 27009476 PMCID: PMC4806347 DOI: 10.1038/srep23605] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 03/09/2016] [Indexed: 02/02/2023] Open

Schwarz RF, Tamuri AU, Kultys M, King J, Godwin J, Florescu AM, Schultz J, Goldman N. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments. Nucleic Acids Res 2016;44:e77. [PMID: 26819408 PMCID: PMC4856975 DOI: 10.1093/nar/gkw022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 01/08/2016] [Indexed: 12/19/2022] Open

Substrate-binding specificity of chitinase and chitosanase as revealed by active-site architecture analysis. Carbohydr Res 2015;418:50-56. [PMID: 26545262 DOI: 10.1016/j.carres.2015.10.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Revised: 10/03/2015] [Accepted: 10/06/2015] [Indexed: 11/21/2022]

Chagoyen M, García-Martín JA, Pazos F. Practical analysis of specificity-determining residues in protein families. Brief Bioinform 2015;17:255-61. [DOI: 10.1093/bib/bbv045] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 06/15/2015] [Indexed: 12/17/2022] Open

Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014;16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol 2013;10:42-9. [DOI: 10.1038/nchembio.1387] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 10/02/2013] [Indexed: 11/08/2022]

Digging up enzyme functions. Nat Chem Biol 2013;10:4-5. [DOI: 10.1038/nchembio.1413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Chen Z, Zeng AP. Protein design in systems metabolic engineering for industrial strain development. Biotechnol J 2013;8:523-33. [PMID: 23589416 DOI: 10.1002/biot.201200238] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Revised: 01/24/2013] [Accepted: 02/27/2013] [Indexed: 12/20/2022]

Gaston D, Susko E, Roger AJ. A phylogenetic mixture model for the identification of functionally divergent protein residues. ACTA ACUST UNITED AC 2011;27:2655-63. [PMID: 21840876 DOI: 10.1093/bioinformatics/btr470] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

MOTIVATION

To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples.

RESULTS

We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions.

AVAILABILITY

http://rogerlab.biochem.dal.ca/Software

CONTACT

andrew.roger@dal.ca

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Perret A, Lechaplais C, Tricot S, Perchat N, Vergne C, Pellé C, Bastard K, Kreimeyer A, Vallenet D, Zaparucha A, Weissenbach J, Salanoubat M. A novel acyl-CoA beta-transaminase characterized from a metagenome. PLoS One 2011;6:e22918. [PMID: 21826218 PMCID: PMC3149608 DOI: 10.1371/journal.pone.0022918] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 07/09/2011] [Indexed: 11/19/2022] Open