Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

56
(from Reference Citation Analysis)

Article PDFs (13)

Cited by > 0 (46)

Searched Name

Sergey Ovchinnikov

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Genomic language model predicts protein co-regulation and function. Nat Commun 2024;15:2880. [PMID: 38570504 PMCID: PMC10991518 DOI: 10.1038/s41467-024-46947-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 03/13/2024] [Indexed: 04/05/2024] Open Abstract Deciphering the relationship between a gene and its genomic context is fundamental to understanding and engineering biological systems. Machine learning has shown promise in learning latent relationships underlying the sequence-structure-function paradigm from massive protein sequence datasets. However, to date, limited attempts have been made in extending this continuum to include higher order genomic context information. Evolutionary processes dictate the specificity of genomic contexts in which a gene is found across phylogenetic distances, and these emergent genomic patterns can be leveraged to uncover functional relationships between gene products. Here, we train a genomic language model (gLM) on millions of metagenomic scaffolds to learn the latent functional and regulatory relationships between genes. gLM learns contextualized protein embeddings that capture the genomic context as well as the protein sequence itself, and encode biologically meaningful and functionally relevant information (e.g. enzymatic function, taxonomy). Our analysis of the attention patterns demonstrates that gLM is learning co-regulated functional modules (i.e. operons). Our findings illustrate that gLM's unsupervised deep learning of the metagenomic corpus is an effective and promising approach to encode functional semantics and regulatory syntax of genes in their genomic contexts and uncover complex relationships between genes in a genomic region. Collapse Key Words machine learning computational models microbial genetics Collapse MESH Headings Phylogeny Semantics Machine Learning Operon Proteins Metagenomics Collapse Grants DP5 OD026389 NIH HHS Gordon and Betty Moore Foundation (Gordon E. and Betty I. Moore Foundation) National Science Foundation (NSF) National Aeronautics and Space Administration (NASA) Collapse
2	Computational design of soluble functional analogues of integral membrane proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.09.540044. [PMID: 38496615 PMCID: PMC10942269 DOI: 10.1101/2023.05.09.540044] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024] Abstract De novo design of complex protein folds using solely computational means remains a significant challenge. Here, we use a robust deep learning pipeline to design complex folds and soluble analogues of integral membrane proteins. Unique membrane topologies, such as those from GPCRs, are not found in the soluble proteome and we demonstrate that their structural features can be recapitulated in solution. Biophysical analyses reveal high thermal stability of the designs and experimental structures show remarkable design accuracy. The soluble analogues were functionalized with native structural motifs, standing as a proof-of-concept for bringing membrane protein functions to the soluble proteome, potentially enabling new approaches in drug discovery. In summary, we designed complex protein topologies and enriched them with functionalities from membrane proteins, with high experimental success rates, leading to a de facto expansion of the functional soluble fold space. Collapse Key Words Collapse MESH Headings Collapse Grants R35 GM138368 NIGMS NIH HHS Collapse
3	An atlas of protein homo-oligomerization across domains of life. Cell 2024;187:999-1010.e15. [PMID: 38325366 DOI: 10.1016/j.cell.2024.01.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 11/03/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024] Abstract Protein structures are essential to understanding cellular processes in molecular detail. While advances in artificial intelligence revealed the tertiary structure of proteins at scale, their quaternary structure remains mostly unknown. We devise a scalable strategy based on AlphaFold2 to predict homo-oligomeric assemblies across four proteomes spanning the tree of life. Our results suggest that approximately 45% of an archaeal proteome and a bacterial proteome and 20% of two eukaryotic proteomes form homomers. Our predictions accurately capture protein homo-oligomerization, recapitulate megadalton complexes, and unveil hundreds of homo-oligomer types, including three confirmed experimentally by structure determination. Integrating these datasets with omics information suggests that a majority of known protein complexes are symmetric. Finally, these datasets provide a structural context for interpreting disease mutations and reveal coiled-coil regions as major enablers of quaternary structure evolution in human. Our strategy is applicable to any organism and provides a comprehensive view of homo-oligomerization in proteomes. Collapse Key Words AlphaFold2 Protein complexes Protein structure Single nucleotide polymorphisms Structure prediction Symmetry evolution homo-oligomer homomer structuromics Collapse MESH Headings Humans Artificial Intelligence Proteome Proteins/chemistry Proteins/genetics Archaea/chemistry Archaea/genetics Eukaryota/chemistry Eukaryota/genetics Bacteria/chemistry Bacteria/genetics Collapse Grants Collapse
4	NMPFamsDB: a database of novel protein families from microbial metagenomes and metatranscriptomes. Nucleic Acids Res 2024;52:D502-D512. [PMID: 37811892 PMCID: PMC10767849 DOI: 10.1093/nar/gkad800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 09/19/2023] [Indexed: 10/10/2023] Open Abstract The Novel Metagenome Protein Families Database (NMPFamsDB) is a database of metagenome- and metatranscriptome-derived protein families, whose members have no hits to proteins of reference genomes or Pfam domains. Each protein family is accompanied by multiple sequence alignments, Hidden Markov Models, taxonomic information, ecosystem and geolocation metadata, sequence and structure predictions, as well as 3D structure models predicted with AlphaFold2. In its current version, NMPFamsDB hosts over 100 000 protein families, each with at least 100 members. The reported protein families significantly expand (more than double) the number of known protein sequence clusters from reference genomes and reveal new insights into their habitat distribution, origins, functions and taxonomy. We expect NMPFamsDB to be a valuable resource for microbial proteome-wide analyses and for further discovery and characterization of novel functions. NMPFamsDB is publicly available in http://www.nmpfamsdb.org/ or https://bib.fleming.gr/NMPFamsDB. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Databases, Factual Databases, Protein Ecosystem Metagenome Proteins/chemistry Geography Collapse Grants 1855-BOLOGNA HFRI 838018 Marie Sklodowska-Curie Fondation Sante U.S. Department of Energy Joint Genome Institute DE-AC02-05CH11231 U.S. Department of Energy Collapse
5	Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024;625:832-839. [PMID: 37956700 PMCID: PMC10808063 DOI: 10.1038/s41586-023-06832-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/03/2023] [Indexed: 11/15/2023] Abstract AlphaFold2 (ref. 1) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein's biological function often depends on multiple conformational substates2, and disease-causing point mutations often cause population changes within these substates3,4. We demonstrate that clustering a multiple-sequence alignment by sequence similarity enables AlphaFold2 to sample alternative states of known metamorphic proteins with high confidence. Using this method, named AF-Cluster, we investigated the evolutionary distribution of predicted structures for the metamorphic protein KaiB5 and found that predictions of both conformations were distributed in clusters across the KaiB family. We used nuclear magnetic resonance spectroscopy to confirm an AF-Cluster prediction: a cyanobacteria KaiB variant is stabilized in the opposite state compared with the more widely studied variant. To test AF-Cluster's sensitivity to point mutations, we designed and experimentally verified a set of three mutations predicted to flip KaiB from Rhodobacter sphaeroides from the ground to the fold-switched state. Finally, screening for alternative states in protein families without known fold switching identified a putative alternative state for the oxidoreductase Mpt53 in Mycobacterium tuberculosis. Further development of such bioinformatic methods in tandem with experiments will probably have a considerable impact on predicting protein energy landscapes, essential for illuminating biological function. Collapse Key Words nmr spectroscopy protein structure predictions protein folding Collapse MESH Headings Cluster Analysis Mutation Protein Conformation Proteins/chemistry Proteins/genetics Proteins/metabolism Sequence Alignment Machine Learning Rhodobacter sphaeroides Bacterial Proteins/chemistry Bacterial Proteins/metabolism Protein Folding Collapse Grants P41 GM111135 NIGMS NIH HHS R24 GM141526 NIGMS NIH HHS T32 GM135126 NIGMS NIH HHS Collapse
6	Unraveling the functional dark matter through global metagenomics. Nature 2023;622:594-602. [PMID: 37821698 PMCID: PMC10584684 DOI: 10.1038/s41586-023-06583-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 08/30/2023] [Indexed: 10/13/2023] Abstract Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter. Collapse Key Words computational biology and bioinformatics environmental sciences systems biology Collapse MESH Headings Cluster Analysis Metagenome/genetics Metagenomics/methods Proteins/chemistry Proteins/classification Proteins/genetics Databases, Protein Protein Conformation Microbiology Collapse Grants DP5 OD026389 NIH HHS P20 GM103475 NIGMS NIH HHS Collapse
7	De novo design of protein structure and function with RFdiffusion. Nature 2023;620:1089-1100. [PMID: 37433327 PMCID: PMC10468394 DOI: 10.1038/s41586-023-06415-8] [Citation(s) in RCA: 135] [Impact Index Per Article: 135.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 07/07/2023] [Indexed: 07/13/2023] Abstract There has been considerable recent progress in designing new proteins using deep-learning methods1-9. Despite this progress, a general deep-learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher-order symmetric architectures, has yet to be described. Diffusion models10,11 have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal-binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryogenic electron microscopy structure of a designed binder in complex with influenza haemagglutinin that is nearly identical to the design model. In a manner analogous to networks that produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications. Collapse Key Words protein design proteins machine learning Collapse MESH Headings Catalytic Domain Cryoelectron Microscopy Deep Learning Hemagglutinin Glycoproteins, Influenza Virus/chemistry Hemagglutinin Glycoproteins, Influenza Virus/metabolism Hemagglutinin Glycoproteins, Influenza Virus/ultrastructure Protein Binding Proteins/chemistry Proteins/metabolism Proteins/ultrastructure Collapse Grants INV-010680 Bill & Melinda Gates Foundation T32 GM007250 NIGMS NIH HHS U19 AG065156 NIA NIH HHS Collapse
8	Mega-scale experimental analysis of protein folding stability in biology and design. Nature 2023;620:434-444. [PMID: 37468638 PMCID: PMC10412457 DOI: 10.1038/s41586-023-06328-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 06/14/2023] [Indexed: 07/21/2023] Abstract Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale1. However, the energetics driving folding are invisible in these structures and remain largely unknown2. The hidden thermodynamics of folding can drive disease3,4, shape protein evolution5-7 and guide protein engineering8-10, and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40-72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability. Collapse Key Words high-throughput screening proteins protein databases thermodynamics Collapse MESH Headings Amino Acids/genetics Amino Acids/metabolism Biology/methods DNA, Complementary/genetics Protein Folding Protein Stability Proteins/chemistry Proteins/genetics Proteins/metabolism Thermodynamics Proteolysis Protein Engineering/methods Protein Domains/genetics Mutation Collapse Grants Collapse
9	Co-evolution-based prediction of metal-binding sites in proteomes by machine learning. Nat Chem Biol 2023;19:548-555. [PMID: 36593274 DOI: 10.1038/s41589-022-01223-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 11/08/2022] [Indexed: 01/03/2023] Abstract Metal ions have various important biological roles in proteins, including structural maintenance, molecular recognition and catalysis. Previous methods of predicting metal-binding sites in proteomes were based on either sequence or structural motifs. Here we developed a co-evolution-based pipeline named 'MetalNet' to systematically predict metal-binding sites in proteomes. We applied MetalNet to proteomes of four representative prokaryotic species and predicted 4,849 potential metalloproteins, which substantially expands the currently annotated metalloproteomes. We biochemically and structurally validated previously unannotated metal-binding sites in several proteins, including apo-citrate lyase phosphoribosyl-dephospho-CoA transferase citX, an Escherichia coli enzyme lacking structural or sequence homology to any known metalloprotein (Protein Data Bank (PDB) codes: 7DCM and 7DCN ). MetalNet also successfully recapitulated all known zinc-binding sites from the human spliceosome complex. The pipeline of MetalNet provides a unique and enabling tool for interrogating the hidden metalloproteome and studying metal biology. Collapse Key Words Collapse MESH Headings Humans Amino Acid Sequence Proteome/chemistry Metals/metabolism Metalloproteins/metabolism Binding Sites Escherichia coli/metabolism Machine Learning Collapse Grants Collapse
10	De novo design of small beta barrel proteins. Proc Natl Acad Sci U S A 2023;120:e2207974120. [PMID: 36897987 PMCID: PMC10089152 DOI: 10.1073/pnas.2207974120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 01/27/2023] [Indexed: 03/12/2023] Open Abstract Small beta barrel proteins are attractive targets for computational design because of their considerable functional diversity despite their very small size (<70 amino acids). However, there are considerable challenges to designing such structures, and there has been little success thus far. Because of the small size, the hydrophobic core stabilizing the fold is necessarily very small, and the conformational strain of barrel closure can oppose folding; also intermolecular aggregation through free beta strand edges can compete with proper monomer folding. Here, we explore the de novo design of small beta barrel topologies using both Rosetta energy-based methods and deep learning approaches to design four small beta barrel folds: Src homology 3 (SH3) and oligonucleotide/oligosaccharide-binding (OB) topologies found in nature and five and six up-and-down-stranded barrels rarely if ever seen in nature. Both approaches yielded successful designs with high thermal stability and experimentally determined structures with less than 2.4 Å rmsd from the designed models. Using deep learning for backbone generation and Rosetta for sequence design yielded higher design success rates and increased structural diversity than Rosetta alone. The ability to design a large and structurally diverse set of small beta barrel proteins greatly increases the protein shape space available for designing binders to protein targets of interest. Collapse Key Words high-throughput screening machine learning protein design small beta barrels Collapse MESH Headings Protein Structure, Secondary Models, Molecular Proteins/chemistry Amino Acids Protein Conformation, beta-Strand Protein Folding Collapse Grants R37 AI058072 NIAID NIH HHS U19 AG065156 NIA NIH HHS Howard Hughes Medical Institute P30 GM124165 NIGMS NIH HHS S10 OD020000 NIH HHS R01 CA240339 NCI NIH HHS DP5 OD026389 NIH HHS R01 AI168423 NIAID NIH HHS R56 AI155881 NIAID NIH HHS Collapse
11	Cyclic peptide structure prediction and design using AlphaFold. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.25.529956. [PMID: 36865323 PMCID: PMC9980166 DOI: 10.1101/2023.02.25.529956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2023] Abstract Deep learning networks offer considerable opportunities for accurate structure prediction and design of biomolecules. While cyclic peptides have gained significant traction as a therapeutic modality, developing deep learning methods for designing such peptides has been slow, mostly due to the small number of available structures for molecules in this size range. Here, we report approaches to modify the AlphaFold network for accurate structure prediction and design of cyclic peptides. Our results show this approach can accurately predict the structures of native cyclic peptides from a single sequence, with 36 out of 49 cases predicted with high confidence (pLDDT > 0.85) matching the native structure with root mean squared deviation (RMSD) less than 1.5 Å. Further extending our approach, we describe computational methods for designing sequences of peptide backbones generated by other backbone sampling methods and for de novo design of new macrocyclic peptides. We extensively sampled the structural diversity of cyclic peptides between 7-13 amino acids, and identified around 10,000 unique design candidates predicted to fold into the designed structures with high confidence. X-ray crystal structures for seven sequences with diverse sizes and structures designed by our approach match very closely with the design models (root mean squared deviation < 1.0 Å), highlighting the atomic level accuracy in our approach. The computational methods and scaffolds developed here provide the basis for custom-designing peptides for targeted therapeutic applications. Collapse Key Words Collapse MESH Headings Collapse Grants DP5 OD026389 NIH HHS P30 GM124165 NIGMS NIH HHS Collapse
12	End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman. Bioinformatics 2023;39:6820925. [PMID: 36355460 PMCID: PMC9805565 DOI: 10.1093/bioinformatics/btac724] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 11/12/2022] Open Abstract MOTIVATION Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. RESULTS Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of optimizing predictions of protein sequences with methods that are not fully understood. AVAILABILITY AND IMPLEMENTATION Our code and examples are available at: https://github.com/spetti/SMURF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Humans Sequence Alignment Algorithms Proteins/chemistry Neural Networks, Computer Amino Acid Sequence Collapse Grants FAS Division of Science, Research Computing Group at Harvard University Department of Energy Office of Science National Nuclear Security Administration MCB2032259 NSF 17-SC-20-SC Exascale Computing Project S10 OD028632 NIH HHS DP5 OD026389 NIH HHS #1764269 NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard R35 GM134922 NIGMS NIH HHS Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory P30 CA045508 NCI NIH HHS 735929LPI Moore-Simons Project on the Origin of the Eukaryotic Cell, Simons Foundation National Institutes of Health NIH Developmental Funds from the Cancer Center Support Moore–Simons Project on the Origin of the Eukaryotic Cell, Simons Foundation Collapse
13	Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023;3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open Abstract Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment. Collapse Key Words biodiversity cluster annotation metagenomes metatranscriptomes microbial dark matter protein clustering protein families Collapse MESH Headings Collapse Grants Collapse
14	State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold. PHYSICAL REVIEW LETTERS 2022;129:238101. [PMID: 36563190 DOI: 10.1103/physrevlett.129.238101] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 10/18/2022] [Indexed: 06/17/2023] Abstract The problem of predicting a protein's 3D structure from its primary amino acid sequence is a longstanding challenge in structural biology. Recently, approaches like alphafold have achieved remarkable performance on this task by combining deep learning techniques with coevolutionary data from multiple sequence alignments of related protein sequences. The use of coevolutionary information is critical to these models' accuracy, and without it their predictive performance drops considerably. In living cells, however, the 3D structure of a protein is fully determined by its primary sequence and the biophysical laws that cause it to fold into a low-energy configuration. Thus, it should be possible to predict a protein's structure from only its primary sequence by learning an approximate biophysical energy function. We provide evidence that alphafold has learned such an energy function, and uses coevolution data to solve the global search problem of finding a low-energy conformation. We demonstrate that alphafold'slearned energy function can be used to rank the quality of candidate protein structures with state-of-the-art accuracy, without using any coevolution data. Finally, we explore several applications of this energy function, including the prediction of protein structures without multiple sequence alignments. Collapse Key Words Collapse MESH Headings Protein Conformation Algorithms Models, Molecular Proteins/chemistry Amino Acid Sequence Collapse Grants Collapse
15	A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 2022;29:1056-1067. [PMID: 36344848 PMCID: PMC9663297 DOI: 10.1038/s41594-022-00849-w] [Citation(s) in RCA: 179] [Impact Index Per Article: 89.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 09/20/2022] [Indexed: 11/09/2022] Abstract Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research. Collapse Key Words structural biology protein folding research data Collapse MESH Headings Computational Biology/methods Furylfuramide Binding Sites Proteins/chemistry Databases, Protein Protein Conformation Collapse Grants Wellcome Trust 28159 Cancer Research UK DP5 OD026389 NIH HHS R21 AI156595 NIAID NIH HHS Collapse
16	Temperature- and Field-Induced Transformation of the Magnetic State in Co_2.5Ge_0.5BO₅. Inorg Chem 2022;61:13034-13046. [PMID: 35947773 DOI: 10.1021/acs.inorgchem.2c01193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract A tetravalent-substituted cobalt ludwigite Co_2.5Ge_0.5BO₅ has been synthesized using the flux method. The compound undergoes two magnetic transitions: a long-range antiferromagnetic transition at T_N1 = 84 K and a metamagnetic one at T_N2 = 36 K. The sample-oriented magnetization measurements revealed a fully compensated magnetic moment along the a- and c-axes and an uncompensated one along the b-axis leading to high uniaxial anisotropy. A field-induced enhancement of the ferromagnetic correlations at T_N2 is observed in specific heat measurements. The DFT+GGA calculation predicts the spin configuration of (↑↓↓↑) as a ground state with a magnetic moment of 1.37 μ_B/f.u. The strong hybridization of Ge(4s, 4p) with O (2p) orbitals resulting from the high electronegativity of Ge⁴⁺ is assumed to cause an increase in the interlayer interaction, contributing to the long-range magnetic order. The effect of two super-superexchange pathways Co²⁺-O-B-O-Co²⁺ and Co²⁺-O-M4-O-Co²⁺ on the magnetic state is discussed. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
17	Scaffolding protein functional sites using deep learning. Science 2022;377:387-394. [PMID: 35862514 PMCID: PMC9621694 DOI: 10.1126/science.abn2100] [Citation(s) in RCA: 120] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Abstract The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, "constrained hallucination," optimizes sequences such that their predicted structures contain the desired functional site. The second approach, "inpainting," starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests. Collapse Key Words Collapse MESH Headings Binding Sites Catalysis Deep Learning Protein Binding Protein Engineering/methods Protein Folding Protein Structure, Secondary Proteins/chemistry Collapse Grants U19 AG065156 NIA NIH HHS HHSN272201700059C NIAID NIH HHS R01 CA240339 NCI NIH HHS DP5 OD026389 NIH HHS Howard Hughes Medical Institute Collapse
18	ColabFold: making protein folding accessible to all. Nat Methods 2022;19:679-682. [PMID: 35637307 DOI: 10.1101/2021.08.15.456425] [Citation(s) in RCA: 217] [Impact Index Per Article: 108.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 04/11/2022] [Indexed: 05/26/2023] Abstract ColabFold offers accelerated prediction of protein structures and complexes by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold's 40-60-fold faster search and optimized model utilization enables prediction of close to 1,000 structures per day on a server with one graphics processing unit. Coupled with Google Colaboratory, ColabFold becomes a free and accessible platform for protein folding. ColabFold is open-source software available at https://github.com/sokrypton/ColabFold and its novel environmental databases are available at https://colabfold.mmseqs.com . Collapse Key Words Collapse MESH Headings Computers Databases, Factual Protein Folding Proteins Software Collapse Grants DP5 OD026389 NIH HHS R21 AI156595 NIAID NIH HHS Collapse
19	ColabFold: making protein folding accessible to all. Nat Methods 2022;19:679-682. [PMID: 35637307 PMCID: PMC9184281 DOI: 10.1038/s41592-022-01488-1] [Citation(s) in RCA: 2511] [Impact Index Per Article: 1255.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 04/11/2022] [Indexed: 12/17/2022] Abstract ColabFold offers accelerated prediction of protein structures and complexes by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold’s 40−60-fold faster search and optimized model utilization enables prediction of close to 1,000 structures per day on a server with one graphics processing unit. Coupled with Google Colaboratory, ColabFold becomes a free and accessible platform for protein folding. ColabFold is open-source software available at https://github.com/sokrypton/ColabFold and its novel environmental databases are available at https://colabfold.mmseqs.com. ColabFold is a free and accessible platform for protein folding that provides accelerated prediction of protein structures and complexes using AlphaFold2 or RoseTTAFold. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
20	Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022;27:34-45. [PMID: 34890134 PMCID: PMC8752338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Abstract The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases. Collapse Key Words Collapse MESH Headings Attention Computational Biology Humans Proteins/genetics Sequence Alignment Collapse Grants R35 GM134922 NIGMS NIH HHS T32 HG000047 NHGRI NIH HHS Collapse
21	Computed structures of core eukaryotic protein complexes. Science 2021;374:eabm4805. [PMID: 34762488 PMCID: PMC7612107 DOI: 10.1126/science.abm4805] [Citation(s) in RCA: 230] [Impact Index Per Article: 76.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Abstract Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take advantage of advances in proteome-wide amino acid coevolution analysis and deep-learning–based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes within the Saccharomyces cerevisiae proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of yeast proteins, identify 1505 likely to interact, and build structure models for 106 previously unidentified assemblies and 806 that have not been structurally characterized. These complexes, which have as many as five subunits, play roles in almost all key processes in eukaryotic cells and provide broad insights into biological function. Collapse Key Words Collapse MESH Headings Acyltransferases/chemistry Acyltransferases/metabolism Chromosome Segregation Computational Biology Computer Simulation DNA Repair Deep Learning Evolution, Molecular Homologous Recombination Ligases/chemistry Ligases/metabolism Membrane Proteins/chemistry Membrane Proteins/metabolism Models, Molecular Multiprotein Complexes/chemistry Multiprotein Complexes/metabolism Protein Biosynthesis Protein Conformation Protein Interaction Mapping Protein Interaction Maps Proteome/chemistry Proteome/metabolism Ribosomes/metabolism Saccharomyces cerevisiae/chemistry Saccharomyces cerevisiae Proteins/chemistry Saccharomyces cerevisiae Proteins/metabolism Ubiquitin/chemistry Ubiquitin/metabolism Collapse Grants R21 AI156595 NIAID NIH HHS P30 CA008748 NCI NIH HHS R35 GM118026 NIGMS NIH HHS R01 CA221858 NCI NIH HHS R35 NS097333 NINDS NIH HHS R35 GM136258 NIGMS NIH HHS MC_UP_1201/10 Medical Research Council Collapse
22	Structure-based protein design with deep learning. Curr Opin Chem Biol 2021;65:136-144. [PMID: 34547592 PMCID: PMC8671290 DOI: 10.1016/j.cbpa.2021.08.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022] Abstract Since the first revelation of proteins functioning as macromolecular machines through their three dimensional structures, researchers have been intrigued by the marvelous ways the biochemical processes are carried out by proteins. The aspiration to understand protein structures has fueled extensive efforts across different scientific disciplines. In recent years, it has been demonstrated that proteins with new functionality or shapes can be designed via structure-based modeling methods, and the design strategies have combined all available information - but largely piece-by-piece - from sequence derived statistics to the detailed atomic-level modeling of chemical interactions. Despite the significant progress, incorporating data-derived approaches through the use of deep learning methods can be a game changer. In this review, we summarize current progress, compare the arc of developing the deep learning approaches with the conventional methods, and describe the motivation and concepts behind current strategies that may lead to potential future opportunities. Collapse Key Words Deep learning Neural networks Protein design Protein sequence design Protein structure Protein structure design Collapse MESH Headings Deep Learning Proteins/chemistry Collapse Grants DP5 OD026389 NIH HHS Collapse
23	Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021;373:871-876. [PMID: 34282049 PMCID: PMC7612213 DOI: 10.1126/science.abj8754] [Citation(s) in RCA: 2086] [Impact Index Per Article: 695.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/07/2021] [Indexed: 01/17/2023] Abstract DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research. Collapse Key Words Collapse MESH Headings ADAM Proteins/chemistry Amino Acid Sequence Computer Simulation Cryoelectron Microscopy Crystallography, X-Ray Databases, Protein Deep Learning Membrane Proteins/chemistry Models, Molecular Multiprotein Complexes/chemistry Neural Networks, Computer Protein Conformation Protein Folding Protein Subunits/chemistry Proteins/chemistry Proteins/physiology Receptors, G-Protein-Coupled/chemistry Sphingosine N-Acyltransferase/chemistry Collapse Grants P 29432 Austrian Science Fund FWF R35 GM127390 NIGMS NIH HHS R01 GM123089 NIGMS NIH HHS P01 GM063210 NIGMS NIH HHS R01 AI051321 NIAID NIH HHS DOC 50 Austrian Science Fund FWF Wellcome Trust 209407 Wellcome Trust DP5 OD026389 NIH HHS Collapse
24	Protein sequence design by conformational landscape optimization. Proc Natl Acad Sci U S A 2021;118:e2017228118. [PMID: 33712545 PMCID: PMC7980421 DOI: 10.1073/pnas.2017228118] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open Abstract The protein design problem is to identify an amino acid sequence that folds to a desired structure. Given Anfinsen's thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the desired structure is the lowest energy state. As this calculation involves not only all possible amino acid sequences but also, all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest-energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest-energy conformation for the designed sequence, and typically discarding a large fraction of designed sequences for which this is not the case. Here, we show that by backpropagating gradients through the transform-restrained Rosetta (trRosetta) structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures in a single calculation. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single-point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by conformational landscape optimization with the standard energy-based sequence design methodology in Rosetta and show that the former can result in energy landscapes with fewer alternative energy minima. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low-resolution trRosetta model serves to disfavor alternative states, and the high-resolution Rosetta model serves to create a deep energy minimum at the design target structure. Collapse Key Words energy landscape machine learning protein design sequence optimization stability prediction Collapse MESH Headings Models, Molecular Neural Networks, Computer Protein Conformation Protein Folding Proteins/chemistry Thermodynamics Collapse Grants DP5 OD026389 NIH HHS Howard Hughes Medical Institute Collapse
25	Solution NMR structure of Se0862, a highly conserved cyanobacterial protein involved in biofilm formation. Protein Sci 2020;29:2274-2280. [PMID: 32949024 PMCID: PMC7586914 DOI: 10.1002/pro.3952] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/08/2020] [Accepted: 09/12/2020] [Indexed: 12/13/2022] Abstract Biofilms are accumulations of microorganisms embedded in extracellular matrices that protect against external factors and stressful environments. Cyanobacterial biofilms are ubiquitous and have potential for treatment of wastewater and sustainable production of biofuels. But the underlying mechanisms regulating cyanobacterial biofilm formation are unclear. Here, we report the solution NMR structure of a protein, Se0862, conserved across diverse cyanobacterial species and involved in regulation of biofilm formation in the cyanobacterium Synechococcus elongatus PCC 7942. Se0862 is a class α+β protein with ααββββαα topology and roll architecture, consisting of a four-stranded β-sheet that is flanked by four α-helices on one side. Conserved surface residues constitute a hydrophobic pocket and charged regions that are likely also present in Se0862 orthologs. Collapse Key Words NMR spectroscopy S. elongatus PCC 7942 biofilm cyanobacteria protein structure Collapse MESH Headings Bacterial Proteins/chemistry Biofilms Nuclear Magnetic Resonance, Biomolecular Protein Conformation, alpha-Helical Protein Conformation, beta-Strand Synechococcus/chemistry Synechococcus/physiology Collapse Grants R01 GM129325 NIGMS NIH HHS Air Force Office of Scientific Research National Science Foundation Collapse
26	Advances in Chromatin and Chromosome Research: Perspectives from Multiple Fields. Mol Cell 2020;79:881-901. [PMID: 32768408 PMCID: PMC7888594 DOI: 10.1016/j.molcel.2020.07.003] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 06/12/2020] [Accepted: 07/06/2020] [Indexed: 12/12/2022] Abstract Nucleosomes package genomic DNA into chromatin. By regulating DNA access for transcription, replication, DNA repair, and epigenetic modification, chromatin forms the nexus of most nuclear processes. In addition, dynamic organization of chromatin underlies both regulation of gene expression and evolution of chromosomes into individualized sister objects, which can segregate cleanly to different daughter cells at anaphase. This collaborative review shines a spotlight on technologies that will be crucial to interrogate key questions in chromatin and chromosome biology including state-of-the-art microscopy techniques, tools to physically manipulate chromatin, single-cell methods to measure chromatin accessibility, computational imaging with neural networks and analytical tools to interpret chromatin structure and dynamics. In addition, this review provides perspectives on how these tools can be applied to specific research fields such as genome stability and developmental biology and to test concepts such as phase separation of chromatin. Collapse Key Words Collapse MESH Headings Chromatin/genetics Chromosomes/genetics DNA/genetics DNA Repair/genetics DNA Replication/genetics Epigenesis, Genetic/genetics Humans Nucleosomes/genetics Collapse Grants R00 GM130896 NIGMS NIH HHS R01 GM044794 NIGMS NIH HHS R00 GM123195 NIGMS NIH HHS DP5 OD023111 NIH HHS R37 GM025326 NIGMS NIH HHS R35 GM136322 NIGMS NIH HHS K99 GM123195 NIGMS NIH HHS R01 GM025326 NIGMS NIH HHS Collapse
27	Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat Methods 2020;17:665-680. [PMID: 32483333 PMCID: PMC7603796 DOI: 10.1038/s41592-020-0848-2] [Citation(s) in RCA: 373] [Impact Index Per Article: 93.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 04/22/2020] [Indexed: 12/12/2022] Abstract The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org. Collapse Key Words Collapse MESH Headings Macromolecular Substances/chemistry Models, Molecular Molecular Docking Simulation Peptidomimetics/chemistry Protein Conformation Proteins/chemistry Software Collapse Grants R01 GM099827 NIGMS NIH HHS 18POST34080422 American Heart Association-American Stroke Association R01 DK097376 NIDDK NIH HHS R01 GM117189 NIGMS NIH HHS T32 GM135141 NIGMS NIH HHS Howard Hughes Medical Institute RL1 CA133832 NCI NIH HHS R01 GM126299 NIGMS NIH HHS R01 GM117968 NIGMS NIH HHS R01 GM084453 NIGMS NIH HHS F31 CA243353 NCI NIH HHS R21 GM102716 NIGMS NIH HHS R35 GM122517 NIGMS NIH HHS P30 CA006927 NCI NIH HHS F32 GM110899 NIGMS NIH HHS T32 GM007628 NIGMS NIH HHS P41 RR012408 NCRR NIH HHS R01 GM097207 NIGMS NIH HHS R01 GM099842 NIGMS NIH HHS R01 GM080403 NIGMS NIH HHS R01 GM092802 NIGMS NIH HHS R01 GM073151 NIGMS NIH HHS R35 GM125034 NIGMS NIH HHS R01 AI113867 NIAID NIH HHS R35 GM131923 NIGMS NIH HHS R01 GM127578 NIGMS NIH HHS R21 AI121799 NIAID NIH HHS R01 GM076324 NIGMS NIH HHS R01 GM088277 NIGMS NIH HHS R01 AI143997 NIAID NIH HHS R01 GM078221 NIGMS NIH HHS R01 GM123089 NIGMS NIH HHS R35 ES030443 NIEHS NIH HHS R01 GM132565 NIGMS NIH HHS P42 ES004699 NIEHS NIH HHS R35 GM122579 NIGMS NIH HHS T32 AI007244 NIAID NIH HHS R01 GM098101 NIGMS NIH HHS R01 GM099959 NIGMS NIH HHS F32 CA189246 NCI NIH HHS R01 GM110089 NIGMS NIH HHS F31 GM123616 NIGMS NIH HHS R01 HL122010 NHLBI NIH HHS R01 GM121487 NIGMS NIH HHS U19 AI117905 NIAID NIH HHS R00 GM120388 NIGMS NIH HHS UH2 CA203780 NCI NIH HHS R21 CA219847 NCI NIH HHS T32 GM008268 NIGMS NIH HHS R01 GM073960 NIGMS NIH HHS Collapse
28	Structure determination of the HgcAB complex using metagenome sequence data: insights into microbial mercury methylation. Commun Biol 2020;3:320. [PMID: 32561885 PMCID: PMC7305189 DOI: 10.1038/s42003-020-1047-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 05/27/2020] [Indexed: 11/09/2022] Open Abstract Bacteria and archaea possessing the hgcAB gene pair methylate inorganic mercury (Hg) to form highly toxic methylmercury. HgcA consists of a corrinoid binding domain and a transmembrane domain, and HgcB is a dicluster ferredoxin. However, their detailed structure and function have not been thoroughly characterized. We modeled the HgcAB complex by combining metagenome sequence data mining, coevolution analysis, and Rosetta structure calculations. In addition, we overexpressed HgcA and HgcB in Escherichia coli, confirmed spectroscopically that they bind cobalamin and [4Fe-4S] clusters, respectively, and incorporated these cofactors into the structural model. Surprisingly, the two domains of HgcA do not interact with each other, but HgcB forms extensive contacts with both domains. The model suggests that conserved cysteines in HgcB are involved in shuttling Hg^II, methylmercury, or both. These findings refine our understanding of the mechanism of Hg methylation and expand the known repertoire of corrinoid methyltransferases in nature. Connor J. Cooper et al. expressed HgcA and HgcB in Escherichia coli and modeled the structure of the HgcAB complex by combining metagenome sequence data, coevolution analysis, and ab initio structure calculations. This study provides insights into the biochemical mechanism of mercury (Hg) methylation. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
29	Structural basis of ER-associated protein degradation mediated by the Hrd1 ubiquitin ligase complex. Science 2020;368:368/6489/eaaz2449. [PMID: 32327568 DOI: 10.1126/science.aaz2449] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Revised: 01/18/2020] [Accepted: 03/11/2020] [Indexed: 12/13/2022] Abstract Misfolded luminal endoplasmic reticulum (ER) proteins undergo ER-associated degradation (ERAD-L): They are retrotranslocated into the cytosol, polyubiquitinated, and degraded by the proteasome. ERAD-L is mediated by the Hrd1 complex (composed of Hrd1, Hrd3, Der1, Usa1, and Yos9), but the mechanism of retrotranslocation remains mysterious. Here, we report a structure of the active Hrd1 complex, as determined by cryo-electron microscopy analysis of two subcomplexes. Hrd3 and Yos9 jointly create a luminal binding site that recognizes glycosylated substrates. Hrd1 and the rhomboid-like Der1 protein form two "half-channels" with cytosolic and luminal cavities, respectively, and lateral gates facing one another in a thinned membrane region. These structures, along with crosslinking and molecular dynamics simulation results, suggest how a polypeptide loop of an ERAD-L substrate moves through the ER membrane. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	A demonstration of unsupervised machine learning in species delimitation. Mol Phylogenet Evol 2019;139:106562. [PMID: 31323334 PMCID: PMC6880864 DOI: 10.1016/j.ympev.2019.106562] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 07/03/2019] [Accepted: 07/15/2019] [Indexed: 01/13/2023] Abstract One major challenge to delimiting species with genetic data is successfully differentiating population structure from species-level divergence, an issue exacerbated in taxa inhabiting naturally fragmented habitats. Many fields of science are now using machine learning, and in evolutionary biology supervised machine learning has recently been used to infer species boundaries. These supervised methods require training data with associated labels. Conversely, unsupervised machine learning (UML) uses inherent data structure and does not require user-specified training labels, potentially providing more objectivity in species delimitation. In the context of integrative taxonomy, we demonstrate the utility of three UML approaches (random forests, variational autoencoders, t-distributed stochastic neighbor embedding) for species delimitation in an arachnid taxon with high population genetic structure (Opiliones, Laniatores, Metanonychus). We find that UML approaches successfully cluster samples according to species-level divergences and not high levels of population structure, while model-based validation methods severely over-split putative species. UML offers intuitive data visualization in two-dimensional space, the ability to accommodate various data types, and has potential in many areas of systematic and evolutionary biology. We argue that machine learning methods are ideally suited for species delimitation and may perform well in many natural systems and across taxa with diverse biological characteristics. Collapse Key Words Integrative taxonomy Opiliones Random forest Ultraconserved elements Variational autoencoders t-SNE Collapse MESH Headings Animals Arachnida/classification Arachnida/genetics Cluster Analysis Phylogeny Polymorphism, Single Nucleotide Principal Component Analysis Unsupervised Machine Learning Collapse Grants DP5 OD026389 NIH HHS Collapse
31	Template-based modeling by ClusPro in CASP13 and the potential for using co-evolutionary information in docking. Proteins 2019;87:1241-1248. [PMID: 31444975 DOI: 10.1002/prot.25808] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 07/21/2019] [Accepted: 07/30/2019] [Indexed: 12/29/2022] Abstract As a participant in the joint CASP13-CAPRI46 assessment, the ClusPro server debuted its new template-based modeling functionality. The addition of this feature, called ClusPro TBM, was motivated by the previous CASP-CAPRI assessments and by the proven ability of template-based methods to produce higher-quality models, provided templates are available. In prior assessments, ClusPro submissions consisted of models that were produced via free docking of pre-generated homology models. This method was successful in terms of the number of acceptable predictions across targets; however, analysis of results showed that purely template-based methods produced a substantially higher number of medium-quality models for targets for which there were good templates available. The addition of template-based modeling has expanded ClusPro's ability to produce higher accuracy predictions, primarily for homomeric but also for some heteromeric targets. Here we review the newest additions to the ClusPro web server and discuss examples of CASP-CAPRI targets that continue to drive further development. We also describe ongoing work not yet implemented in the server. This includes the development of methods to improve template-based models and the use of co-evolutionary information for data-assisted free docking. Collapse Key Words homology modeling method development modeling of protein complexes protein-protein interaction template-based Collapse MESH Headings Collapse Grants Collapse
32	Protein interaction networks revealed by proteome coevolution. SCIENCE (NEW YORK, N.Y.) 2019;365:185-189. [PMID: 31296772 DOI: 10.1126/science.aaw6718] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 06/07/2019] [Indexed: 01/19/2023] Abstract Residue-residue coevolution has been observed across a number of protein-protein interfaces, but the extent of residue coevolution between protein families on the whole-proteome scale has not been systematically studied. We investigate coevolution between 5.4 million pairs of proteins in Escherichia coli and between 3.9 millions pairs in Mycobacterium tuberculosis We find strong coevolution for binary complexes involved in metabolism and weaker coevolution for larger complexes playing roles in genetic information processing. We take advantage of this coevolution, in combination with structure modeling, to predict protein-protein interactions (PPIs) with an accuracy that benchmark studies suggest is considerably higher than that of proteome-wide two-hybrid and mass spectrometry screens. We identify hundreds of previously uncharacterized PPIs in E. coli and M. tuberculosis that both add components to known protein complexes and networks and establish the existence of new ones. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
33	A structural and data-driven approach to engineering a plant cytochrome P450 enzyme. SCIENCE CHINA-LIFE SCIENCES 2019;62:873-882. [DOI: 10.1007/s11427-019-9538-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 02/26/2019] [Indexed: 10/26/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
34	Development of a dual-functional conjugate of antigenic peptide and Fc-III mimetics (DCAF) for targeted antibody blocking. Chem Sci 2019;10:3271-3280. [PMID: 30996912 PMCID: PMC6429600 DOI: 10.1039/c8sc05273e] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 01/28/2019] [Indexed: 01/12/2023] Open Abstract Targeted antibody blocking enables characterization of binding sites on immunoglobulin G (IgG), and can efficiently eliminate harmful antibodies from organisms. In this report, we present a novel peptide-denoted as a dual-functional conjugate of antigenic peptide and Fc-III mimetics (DCAF)-for targeted blocking of antibodies. Synthesis of DCAF was achieved by native chemical ligation, and the molecule consists of three functional parts: a specific antigenic peptide, a linker and the Fc-III mimetic peptide, which has a high affinity toward the Fc region of IgG molecules. We demonstrate that DCAF binds the cognate antibody with high selectivity by simultaneously binding to the Fab and Fc regions of IgG. Animal experiments revealed that DCAF molecules diminish the antibody-dependent enhancement effect in a dengue virus infection model, and rescue the acetylcholine receptor by inhibiting the complement cascade in a myasthenia gravis model. These results suggest that DCAFs could have utility in the development of new therapeutics against harmful antibodies. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
35	De novo design of a fluorescence-activating β-barrel. Nature 2018;561:485-491. [PMID: 30209393 PMCID: PMC6275156 DOI: 10.1038/s41586-018-0509-0] [Citation(s) in RCA: 201] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 08/10/2018] [Indexed: 01/07/2023] Abstract The regular arrangements of β-strands around a central axis in β-barrels and of α-helices in coiled coils contrast with the irregular tertiary structures of most globular proteins, and have fascinated structural biologists since they were first discovered. Simple parametric models have been used to design a wide range of α-helical coiled-coil structures, but to date there has been no success with β-barrels. Here we show that accurate de novo design of β-barrels requires considerable symmetry-breaking to achieve continuous hydrogen-bond connectivity and eliminate backbone strain. We then build ensembles of β-barrel backbone models with cavity shapes that match the fluorogenic compound DFHBI, and use a hierarchical grid-based search method to simultaneously optimize the rigid-body placement of DFHBI in these cavities and the identities of the surrounding amino acids to achieve high shape and chemical complementarity. The designs have high structural accuracy and bind and fluorescently activate DFHBI in vitro and in Escherichia coli, yeast and mammalian cells. This de novo design of small-molecule binding activity, using backbones custom-built to bind the ligand, should enable the design of increasingly sophisticated ligand-binding proteins, sensors and catalysts that are not limited by the backbone geometries available in known protein structures. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
36	An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12. Sci Rep 2018;8:9939. [PMID: 29967418 PMCID: PMC6028396 DOI: 10.1038/s41598-018-26812-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 05/17/2018] [Indexed: 01/14/2023] Open Abstract Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research. Collapse Key Words Collapse MESH Headings Caspase 12/chemistry Caspase 12/metabolism Caspases/chemistry Caspases/metabolism Computational Biology/methods Humans Models, Molecular Protein Conformation Software Collapse Grants R01 GM116960 NIGMS NIH HHS R01 GM100701 NIGMS NIH HHS R01 GM083107 NIGMS NIH HHS R01 GM014312 NIGMS NIH HHS R35 GM122543 NIGMS NIH HHS T32 GM007270 NIGMS NIH HHS R01 GM093123 NIGMS NIH HHS R01 GM052032 NIGMS NIH HHS Collapse
37	Protein homology model refinement by large-scale energy optimization. Proc Natl Acad Sci U S A 2018;115:3054-3059. [PMID: 29507254 PMCID: PMC5866580 DOI: 10.1073/pnas.1719115115] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open Abstract Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function. Collapse Key Words energy function homology modeling protein conformational search protein structure prediction protein structure refinement Collapse MESH Headings Computational Biology/methods Computer Simulation Models, Chemical Models, Molecular Molecular Dynamics Simulation Protein Conformation Protein Folding Thermodynamics Collapse Grants R01 GM092802 NIGMS NIH HHS R01 GM123089 NIGMS NIH HHS T32 GM007270 NIGMS NIH HHS Howard Hughes Medical Institute Collapse
38	Automatic structure prediction of oligomeric assemblies using Robetta in CASP12. Proteins 2018;86 Suppl 1:283-291. [PMID: 28913931 PMCID: PMC6019630 DOI: 10.1002/prot.25387] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 09/01/2017] [Accepted: 09/11/2017] [Indexed: 12/15/2022] Abstract Many naturally occurring protein systems function primarily as symmetric assemblies. Prediction of the quaternary structure of these assemblies is an important biological problem. This article describes automated tools we have developed for predicting the structures of symmetric protein assemblies in the Robetta structure prediction server. We assess the performance of this pipeline on a set of targets from the recent CASP12/CAPRI blind quaternary structure prediction experiment. Our approach successfully predicted 5 of 7 symmetric assemblies in this challenge, and was assessed as the best participating server group, and 1 of only 2 groups (human or server) with 2 predictions judged as high quality by the assessors. We also assess the method on a broader set of 22 natively symmetric CASP12 targets, where we show that oligomeric modeling can improve the accuracy of monomeric structure determination, particularly in highly intertwined oligomers. Collapse Key Words CASP12 Rosetta protein interfaces structure prediction symmetric assemblies Collapse MESH Headings Computational Biology/methods Databases, Protein Humans Models, Molecular Protein Conformation Protein Multimerization Proteins/chemistry Sequence Analysis, Protein Software Collapse Grants R01 GM123089 NIGMS NIH HHS T32 GM007270 NIGMS NIH HHS Collapse
39	Protein structure prediction using Rosetta in CASP12. Proteins 2017;86 Suppl 1:113-121. [PMID: 28940798 DOI: 10.1002/prot.25390] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 09/18/2017] [Indexed: 12/20/2022] Abstract We describe several notable aspects of our structure predictions using Rosetta in CASP12 in the free modeling (FM) and refinement (TR) categories. First, we had previously generated (and published) models for most large protein families lacking experimentally determined structures using Rosetta guided by co-evolution based contact predictions, and for several targets these models proved better starting points for comparative modeling than any known crystal structure-our model database thus starts to fulfill one of the goals of the original protein structure initiative. Second, while our "human" group simply submitted ROBETTA models for most targets, for six targets expert intervention improved predictions considerably; the largest improvement was for T0886 where we correctly parsed two discontinuous domains guided by predicted contact maps to accurately identify a structural homolog of the same fold. Third, Rosetta all atom refinement followed by MD simulations led to consistent but small improvements when starting models were close to the native structure, and larger but less consistent improvements when starting models were further away. Collapse Key Words Rosetta ab initio prediction co-evolution protein structure prediction refinement Collapse MESH Headings Collapse Grants Collapse
40	Cryo-EM structure of the protein-conducting ERAD channel Hrd1 in complex with Hrd3. Nature 2017;548:352-355. [PMID: 28682307 PMCID: PMC5736104 DOI: 10.1038/nature23314] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2017] [Accepted: 06/30/2017] [Indexed: 12/16/2022] Abstract Misfolded endoplasmic reticulum (ER) proteins are retro-translocated through the membrane into the cytosol, where they are poly-ubiquitinated, extracted from the ER membrane, and degraded by the proteasome 1–4, a pathway termed ER-associated protein degradation (ERAD). Proteins with misfolded domains in the ER lumen or membrane are discarded through the ERAD-L and –M pathways, respectively. In S. cerevisiae, both pathways require the ubiquitin ligase Hrd1, a multi-spanning membrane protein with a cytosolic RING finger domain 5,6. Hrd1 is the crucial membrane component for retro-translocation 7,8, but whether it forms a protein-conducting channel is unclear. Here, we report a cryo-electron microscopy (cryo-EM) structure of S. cerevisiae Hrd1 in complex with its ER luminal binding partner Hrd3. Hrd1 forms a dimer within the membrane with one or two Hrd3 molecules associated at its luminal side. Each Hrd1 molecule has eight trans-membrane segments, five of which form an aqueous cavity extending from the cytosol almost to the ER lumen, while a segment of the neighboring Hrd1 molecule forms a lateral seal. The aqueous cavity and lateral gate are reminiscent of features in protein-conducting conduits that facilitate polypeptide movement in the opposite direction, i.e. from the cytosol into or across membranes 9–11. Our results suggest that Hrd1 forms a retro-translocation channel for the movement of misfolded polypeptides through the ER membrane. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
41	Applications of contact predictions to structural biology. IUCRJ 2017;4:291-300. [PMID: 28512576 PMCID: PMC5414403 DOI: 10.1107/s2052252517005115] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/03/2017] [Indexed: 06/07/2023] Abstract Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallo-graphy, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR) benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental methods. Finally, predicted contacts can distinguish between biologically relevant interfaces and mere lattice contacts in a final crystal structure, and have potential in the identification of functionally important regions and in foreseeing the consequences of mutations. Collapse Key Words NMR distance restraints X-ray crystallography evolutionary covariance predicted contacts structural bioinformatics Collapse MESH Headings Collapse Grants T32 GM007270 NIGMS NIH HHS Collapse
42	Architectures of Lipid Transport Systems for the Bacterial Outer Membrane. Cell 2017;169:273-285.e17. [PMID: 28388411 PMCID: PMC5467742 DOI: 10.1016/j.cell.2017.03.019] [Citation(s) in RCA: 140] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 01/07/2017] [Accepted: 03/14/2017] [Indexed: 10/19/2022] Abstract How phospholipids are trafficked between the bacterial inner and outer membranes through the hydrophilic space of the periplasm is not known. We report that members of the mammalian cell entry (MCE) protein family form hexameric assemblies with a central channel capable of mediating lipid transport. The E. coli MCE protein, MlaD, forms a ring associated with an ABC transporter complex in the inner membrane. A soluble lipid-binding protein, MlaC, ferries lipids between MlaD and an outer membrane protein complex. In contrast, EM structures of two other E. coli MCE proteins show that YebT forms an elongated tube consisting of seven stacked MCE rings, and PqiB adopts a syringe-like architecture. Both YebT and PqiB create channels of sufficient length to span the periplasmic space. This work reveals diverse architectures of highly conserved protein-based channels implicated in the transport of lipids between the membranes of bacteria and some eukaryotic organelles. Collapse Key Words E. coli MCE bacteria cryo-EM crystallography hexametric rings lipids mammalian cell entry outer membrane periplasm structure transport Collapse MESH Headings Cell Membrane/chemistry Crystallography, X-Ray Escherichia coli/chemistry Escherichia coli Proteins/chemistry Membrane Proteins/chemistry Microscopy, Electron Models, Molecular Multiprotein Complexes/chemistry Collapse Grants Howard Hughes Medical Institute S10 OD020054 NIH HHS R01 AI027655 NIAID NIH HHS K99 GM112982 NIGMS NIH HHS P01 AI063302 NIAID NIH HHS BB/M00810X/1 Biotechnology and Biological Sciences Research Council Collapse
43	Overcoming an optimization plateau in the directed evolution of highly efficient nerve agent bioscavengers. Protein Eng Des Sel 2017;30:333-345. [DOI: 10.1093/protein/gzx003] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 01/10/2017] [Indexed: 11/13/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
44	Architectures of Lipid Transport Systems for the Bacterial Outer Membrane. Biophys J 2017. [DOI: 10.1016/j.bpj.2016.11.107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
45	Protein structure determination using metagenome sequence data. Science 2017;355:294-298. [PMID: 28104891 PMCID: PMC5493203 DOI: 10.1126/science.aah4043] [Citation(s) in RCA: 331] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 11/22/2016] [Indexed: 01/30/2023] Abstract Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Sequence Computational Biology/methods Crystallography, X-Ray Databases, Protein Evolution, Molecular Metagenome Models, Molecular Protein Conformation Protein Folding Proteins/chemistry Proteins/genetics Sequence Analysis, Protein Software Collapse Grants Howard Hughes Medical Institute R01 GM092802 NIGMS NIH HHS T32 GM007270 NIGMS NIH HHS Collapse
46	Structural insights into SAM domain-mediated tankyrase oligomerization. Protein Sci 2016;25:1744-52. [PMID: 27328430 DOI: 10.1002/pro.2968] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 06/16/2016] [Indexed: 12/28/2022] Abstract Tankyrase 1 (TNKS1; a.k.a. ARTD5) and tankyrase 2 (TNKS2; a.k.a ARTD6) are highly homologous poly(ADP-ribose) polymerases (PARPs) that function in a wide variety of cellular processes including Wnt signaling, Src signaling, Akt signaling, Glut4 vesicle translocation, telomere length regulation, and centriole and spindle pole maturation. Tankyrase proteins include a sterile alpha motif (SAM) domain that undergoes oligomerization in vitro and in vivo. However, the SAM domains of TNKS1 and TNKS2 have not been structurally characterized and the mode of oligomerization is not yet defined. Here we model the SAM domain-mediated oligomerization of tankyrase. The structural model, supported by mutagenesis and NMR analysis, demonstrates a helical, homotypic head-to-tail polymer that facilitates TNKS self-association. Furthermore, we show that TNKS1 and TNKS2 can form (TNKS1 SAM-TNKS2 SAM) hetero-oligomeric structures mediated by their SAM domains. Though wild-type tankyrase proteins have very low solubility, model-based mutations of the SAM oligomerization interface residues allowed us to obtain soluble TNKS proteins. These structural insights will be invaluable for the functional and biophysical characterization of TNKS1/2, including the role of TNKS oligomerization in protein poly(ADP-ribosyl)ation (PARylation) and PARylation-dependent ubiquitylation. Collapse Key Words PARP PARP5 PARylation SAM molecular model oligomerization protein poly(ADP-ribosyl)ation sterile alpha motif tankyrase Collapse MESH Headings Collapse Grants Collapse
47	Structure of a bd oxidase indicates similar mechanisms for membrane-integrated oxygen reductases. Science 2016;352:583-6. [PMID: 27126043 DOI: 10.1126/science.aaf2477] [Citation(s) in RCA: 106] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 03/28/2016] [Indexed: 12/29/2022] Abstract The cytochrome bd oxidases are terminal oxidases that are present in bacteria and archaea. They reduce molecular oxygen (dioxygen) to water, avoiding the production of reactive oxygen species. In addition to their contribution to the proton motive force, they mediate viability under oxygen-related stress conditions and confer tolerance to nitric oxide, thus contributing to the virulence of pathogenic bacteria. Here we present the atomic structure of the bd oxidase from Geobacillus thermodenitrificans, revealing a pseudosymmetrical subunit fold. The arrangement and order of the heme cofactors support the conclusions from spectroscopic measurements that the cleavage of the dioxygen bond may be mechanistically similar to that in the heme-copper-containing oxidases, even though the structures are completely different. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
48	Structure prediction using sparse simulated NOE restraints with Rosetta in CASP11. Proteins 2016;84 Suppl 1:181-8. [PMID: 26857542 PMCID: PMC5490372 DOI: 10.1002/prot.25006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Revised: 01/11/2016] [Accepted: 02/02/2016] [Indexed: 12/17/2022] Abstract In CASP11 we generated protein structure models using simulated ambiguous and unambiguous nuclear Overhauser effect (NOE) restraints with a two stage protocol. Low resolution models were generated guided by the unambiguous restraints using continuous chain folding for alpha and alpha-beta proteins, and iterative annealing for all beta proteins to take advantage of the strand pairing information implicit in the restraints. The Rosetta fragment/model hybridization protocol was then used to recombine and regularize these models, and refine them in the Rosetta full atom energy function guided by both the unambiguous and the ambiguous restraints. Fifteen out of 19 targets were modeled with GDT-TS quality scores greater than 60 for Model 1, significantly improving upon the non-assisted predictions. Our results suggest that atomic level accuracy is achievable using sparse NOE data when there is at least one correctly assigned NOE for every residue. Proteins 2016; 84(Suppl 1):181-188. © 2016 Wiley Periodicals, Inc. Collapse Key Words CASP11 NMR Rosetta contact assisted protein structure prediction Collapse MESH Headings Collapse Grants Collapse
49	Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins 2016;84 Suppl 1:67-75. [PMID: 26677056 PMCID: PMC5490371 DOI: 10.1002/prot.24974] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 11/27/2015] [Accepted: 12/12/2015] [Indexed: 12/19/2022] Abstract We describe CASP11 de novo blind structure predictions made using the Rosetta structure prediction methodology with both automatic and human assisted protocols. Model accuracy was generally improved using coevolution derived residue-residue contact information as restraints during Rosetta conformational sampling and refinement, particularly when the number of sequences in the family was more than three times the length of the protein. The highlight was the human assisted prediction of T0806, a large and topologically complex target with no homologs of known structure, which had unprecedented accuracy-<3.0 Å root-mean-square deviation (RMSD) from the crystal structure over 223 residues. For this target, we increased the amount of conformational sampling over our fully automated method by employing an iterative hybridization protocol. Our results clearly demonstrate, in a blind prediction scenario, that coevolution derived contacts can considerably increase the accuracy of template-free structure modeling. Proteins 2016; 84(Suppl 1):67-75. © 2015 Wiley Periodicals, Inc. Collapse Key Words ab initio prediction coevolution contact prediction protein structure prediction rosetta Collapse MESH Headings Collapse Grants Collapse
50	Catalytic efficiencies of directly evolved phosphotriesterase variants with structurally different organophosphorus compounds in vitro. Arch Toxicol 2015;90:2711-2724. [DOI: 10.1007/s00204-015-1626-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/22/2015] [Indexed: 11/29/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse