1
|
Xiouras C, Cameli F, Quilló GL, Kavousanakis ME, Vlachos DG, Stefanidis GD. Applications of Artificial Intelligence and Machine Learning Algorithms to Crystallization. Chem Rev 2022; 122:13006-13042. [PMID: 35759465 DOI: 10.1021/acs.chemrev.2c00141] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Artificial intelligence and specifically machine learning applications are nowadays used in a variety of scientific applications and cutting-edge technologies, where they have a transformative impact. Such an assembly of statistical and linear algebra methods making use of large data sets is becoming more and more integrated into chemistry and crystallization research workflows. This review aims to present, for the first time, a holistic overview of machine learning and cheminformatics applications as a novel, powerful means to accelerate the discovery of new crystal structures, predict key properties of organic crystalline materials, simulate, understand, and control the dynamics of complex crystallization process systems, as well as contribute to high throughput automation of chemical process development involving crystalline materials. We critically review the advances in these new, rapidly emerging research areas, raising awareness in issues such as the bridging of machine learning models with first-principles mechanistic models, data set size, structure, and quality, as well as the selection of appropriate descriptors. At the same time, we propose future research at the interface of applied mathematics, chemistry, and crystallography. Overall, this review aims to increase the adoption of such methods and tools by chemists and scientists across industry and academia.
Collapse
Affiliation(s)
- Christos Xiouras
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Fabio Cameli
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Gustavo Lunardon Quilló
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium.,Chemical and BioProcess Technology and Control, Department of Chemical Engineering, Faculty of Engineering Technology, KU Leuven, Gebroeders de Smetstraat 1, 9000 Ghent, Belgium
| | - Mihail E Kavousanakis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Georgios D Stefanidis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece.,Laboratory for Chemical Technology, Ghent University; Tech Lane Ghent Science Park 125, B-9052 Ghent, Belgium
| |
Collapse
|
2
|
Liu H, Zou S, Dai S, Zhang J, Li W. Dopamine sheathing facilitates the anisotropic growth of lysozyme crystals. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2021.115826] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
3
|
Abstract
The continued development of X-ray free-electron lasers and serial crystallography techniques has opened up new experimental frontiers. Nanoscale dynamical processes such as crystal growth can now be probed at unprecedented time and spatial resolutions. Pair-angle distribution function (PADF) analysis is a correlation-based technique that has the potential to extend the limits of current serial crystallography experiments, by relaxing the requirements for crystal order, size and number density per exposure. However, unlike traditional crystallographic methods, the PADF technique does not recover the electron density directly. Instead it encodes substantial information about local three-dimensional structure in the form of three- and four-body correlations. It is not yet known how protein structure maps into the many-body PADF correlations. In this paper, we explore the relationship between the PADF and protein conformation. We calculate correlations in reciprocal and real space for model systems exhibiting increasing degrees of order and secondary structural complexity, from disordered polypeptides, single alpha helices, helix bundles and finally a folded 100 kilodalton protein. These models systems inform us about the distinctive angular correlations generated by bonding, polypeptide chains, secondary structure and tertiary structure. They further indicate the potential to use angular correlations as a sensitive measure of conformation change that is complementary to existing structural analysis techniques.
Collapse
|
4
|
Lynch ML, Dudek MF, Bowman SE. A Searchable Database of Crystallization Cocktails in the PDB: Analyzing the Chemical Condition Space. PATTERNS (NEW YORK, N.Y.) 2020; 1:100024. [PMID: 32776019 PMCID: PMC7409820 DOI: 10.1016/j.patter.2020.100024] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 03/22/2020] [Accepted: 03/30/2020] [Indexed: 10/26/2022]
Abstract
Nearly 90% of structural models in the Protein Data Bank (PDB), the central resource worldwide for three-dimensional structural information, are currently derived from macromolecular crystallography (MX). A major bottleneck in determining MX structures is finding conditions in which a biomolecule will crystallize. Here, we present a searchable database of the chemicals associated with successful crystallization experiments from the PDB. We use these data to examine the relationship between protein secondary structure and average molecular weight of polyethylene glycol and to investigate patterns in crystallization conditions. Our analyses reveal striking patterns of both redundancy of chemical compositions in crystallization experiments and extreme sparsity of specific chemical combinations, underscoring the challenges faced in generating predictive models for de novo optimal crystallization experiments.
Collapse
Affiliation(s)
- Miranda L. Lynch
- High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA
| | - Max F. Dudek
- University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Sarah E.J. Bowman
- High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences at the University at Buffalo, Buffalo, NY 14203, USA
| |
Collapse
|
5
|
Jia X, Lynch A, Huang Y, Danielson M, Lang'at I, Milder A, Ruby AE, Wang H, Friedler SA, Norquist AJ, Schrier J. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 2019; 573:251-255. [PMID: 31511682 DOI: 10.1038/s41586-019-1540-5] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2018] [Accepted: 07/10/2019] [Indexed: 01/29/2023]
Abstract
Most chemical experiments are planned by human scientists and therefore are subject to a variety of human cognitive biases1, heuristics2 and social influences3. These anthropogenic chemical reaction data are widely used to train machine-learning models4 that are used to predict organic5 and inorganic6,7 syntheses. However, it is known that societal biases are encoded in datasets and are perpetuated in machine-learning models8. Here we identify as-yet-unacknowledged anthropogenic biases in both the reagent choices and reaction conditions of chemical reaction datasets using a combination of data mining and experiments. We find that the amine choices in the reported crystal structures of hydrothermal synthesis of amine-templated metal oxides9 follow a power-law distribution in which 17% of amine reactants occur in 79% of reported compounds, consistent with distributions in social influence models10-12. An analysis of unpublished historical laboratory notebook records shows similarly biased distributions of reaction condition choices. By performing 548 randomly generated experiments, we demonstrate that the popularity of reactants or the choices of reaction conditions are uncorrelated to the success of the reaction. We show that randomly generated experiments better illustrate the range of parameter choices that are compatible with crystal formation. Machine-learning models that we train on a smaller randomized reaction dataset outperform models trained on larger human-selected reaction datasets, demonstrating the importance of identifying and addressing anthropogenic biases in scientific data.
Collapse
Affiliation(s)
- Xiwen Jia
- Department of Chemistry, Haverford College, Haverford, PA, USA
| | - Allyson Lynch
- Department of Chemistry, Haverford College, Haverford, PA, USA
| | - Yuheng Huang
- Department of Chemistry, Haverford College, Haverford, PA, USA
| | | | | | | | - Aaron E Ruby
- Department of Chemistry, Haverford College, Haverford, PA, USA
| | - Hao Wang
- Department of Chemistry, Haverford College, Haverford, PA, USA
| | | | | | - Joshua Schrier
- Department of Chemistry, Haverford College, Haverford, PA, USA. .,Department of Chemistry, Fordham University, The Bronx, New York, NY, USA.
| |
Collapse
|
6
|
Bhat EA, Abdalla M, Rather IA. Key Factors for Successful Protein Purification and Crystallization. ACTA ACUST UNITED AC 2018. [DOI: 10.17352/gjbbs.000010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
7
|
Drwal MN, Bret G, Perez C, Jacquemard C, Desaphy J, Kellenberger E. Structural Insights on Fragment Binding Mode Conservation. J Med Chem 2018; 61:5963-5973. [PMID: 29906118 DOI: 10.1021/acs.jmedchem.8b00256] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Aiming at a deep understanding of fragment binding to ligandable targets, we performed a large scale analysis of the Protein Data Bank. Binding modes of 1832 drug-like ligands and 1079 fragments to 235 proteins were compared. We observed that the binding modes of fragments and their drug-like superstructures binding to the same protein are mostly conserved, thereby providing experimental evidence for the preservation of fragment binding modes during molecular growing. Furthermore, small chemical changes in the fragment are tolerated without alteration of the fragment binding mode. The exceptions to this observation generally involve conformational variability of the molecules. Our data analysis also suggests that, provided enough fragments have been crystallized within a protein, good interaction coverage of the binding pocket is achieved. Last, we extended our study to 126 crystallization additives and discuss in which cases they provide information relevant to structure-based drug design.
Collapse
Affiliation(s)
- Malgorzata N Drwal
- Laboratoire d'Innovation Thérapeutique , UMR7200, Université de Strasbourg , 74 Route du Rhin , 67401 Illkirch , France
| | - Guillaume Bret
- Laboratoire d'Innovation Thérapeutique , UMR7200, Université de Strasbourg , 74 Route du Rhin , 67401 Illkirch , France
| | - Carlos Perez
- Eli Lilly Research Laboratories , Avenida de la Industria, 30 , 28108 Alcobendas , Madrid , Spain
| | - Célien Jacquemard
- Laboratoire d'Innovation Thérapeutique , UMR7200, Université de Strasbourg , 74 Route du Rhin , 67401 Illkirch , France
| | - Jérémy Desaphy
- Lilly Research Laboratories, Eli Lilly and Company , Lilly Corporate Center , Indianapolis , Indiana 46285 , United States
| | - Esther Kellenberger
- Laboratoire d'Innovation Thérapeutique , UMR7200, Université de Strasbourg , 74 Route du Rhin , 67401 Illkirch , France
| |
Collapse
|
8
|
Ereño-Orbea J, Sicard T, Cui H, Carson J, Hermans P, Julien JP. Structural Basis of Enhanced Crystallizability Induced by a Molecular Chaperone for Antibody Antigen-Binding Fragments. J Mol Biol 2017; 430:322-336. [PMID: 29277294 DOI: 10.1016/j.jmb.2017.12.010] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 11/30/2017] [Accepted: 12/13/2017] [Indexed: 12/20/2022]
Abstract
Monoclonal antibodies constitute one of the largest groups of drugs to treat cancers and immune disorders, and are guiding the design of vaccines against infectious diseases. Fragments antigen-binding (Fabs) have been preferred over monoclonal antibodies for the structural characterization of antibody-antigen complexes due to their relatively low flexibility. Nonetheless, Fabs often remain challenging to crystallize because of the surface characteristics of complementary determining regions and the residual flexibility in the hinge region between the variable and constant domains. Here, we used a variable heavy-chain (VHH) domain specific for the human kappa light chain to assist in the structure determination of three therapeutic Fabs that were recalcitrant to crystallization on their own. We show that this ligand alters the surface properties of the antibody-ligand complex and lowers its aggregation temperature to favor crystallization. The VHH crystallization chaperone also restricts the flexible hinge of Fabs to a narrow range of angles, and so independently of the variable region. Our findings contribute a valuable approach to antibody structure determination and provide biophysical insight into the principles that govern the crystallization of macromolecules.
Collapse
Affiliation(s)
- June Ereño-Orbea
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4
| | - Taylor Sicard
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4; Department of Biochemistry, University of Toronto, Toronto, ON, Canada M5S 1A8
| | - Hong Cui
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4
| | - Jacob Carson
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4
| | - Pim Hermans
- BAC, BV, part of Thermo Fisher Scientific, Leiden, the Netherlands
| | - Jean-Philippe Julien
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4; Department of Biochemistry, University of Toronto, Toronto, ON, Canada M5S 1A8; Department of Immunology, University of Toronto, Toronto, ON, Canada M5S 1A8.
| |
Collapse
|
9
|
Pereira JH, McAndrew RP, Tomaleri GP, Adams PD. Berkeley Screen: a set of 96 solutions for general macromolecular crystallization. J Appl Crystallogr 2017; 50:1352-1358. [PMID: 29021733 PMCID: PMC5627680 DOI: 10.1107/s1600576717011347] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 08/01/2017] [Indexed: 01/29/2023] Open
Abstract
Using statistical analysis of the Biological Macromolecular Crystallization Database, combined with previous knowledge about crystallization reagents, a crystallization screen called the Berkeley Screen has been created. Correlating crystallization conditions and high-resolution protein structures, it is possible to better understand the influence that a particular solution has on protein crystal formation. Ions and small molecules such as buffers and precipitants used in crystallization experiments were identified in electron density maps, highlighting the role of these chemicals in protein crystal packing. The Berkeley Screen has been extensively used to crystallize target proteins from the Joint BioEnergy Institute and the Collaborative Crystallography program at the Berkeley Center for Structural Biology, contributing to several Protein Data Bank entries and related publications. The Berkeley Screen provides the crystallographic community with an efficient set of solutions for general macromolecular crystallization trials, offering a valuable alternative to the existing commercially available screens.
Collapse
Affiliation(s)
- Jose H. Pereira
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
| | - Ryan P. McAndrew
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
| | | | - Paul D. Adams
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
10
|
Hegde RP, Pavithra GC, Dey D, Almo SC, Ramakumar S, Ramagopal UA. Can the propensity of protein crystallization be increased by using systematic screening with metals? Protein Sci 2017. [PMID: 28643473 DOI: 10.1002/pro.3214] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein crystallization is one of the major bottlenecks in protein structure elucidation with new strategies being constantly developed to improve the chances of crystallization. Generally, well-ordered epitopes possessing complementary surface and capable of producing stable inter-protein interactions generate a regular three-dimensional arrangement of protein molecules which eventually results in a crystal lattice. Metals, when used for crystallization, with their various coordination numbers and geometries, can generate such epitopes mediating protein oligomerization and/or establish crystal contacts. Some examples of metal-mediated oligomerization and crystallization together with our experience on metal-mediated crystallization of a putative rRNA methyltransferase from Sinorhizobium meliloti are presented. Analysis of crystal structures from protein data bank (PDB) using a non-redundant data set with a 90% identity cutoff, reveals that around 67% of proteins contain at least one metal ion, with ∼14% containing combination of metal ions. Interestingly, metal containing conditions in most commercially available and popular crystallization kits generally contain only a single metal ion, with combinations of metals only in a very few conditions. Based on the results presented in this review, it appears that the crystallization screens need expansion with systematic screening of metal ions that could be crucial for stabilizing the protein structure or for establishing crystal contact and thereby aiding protein crystallization.
Collapse
Affiliation(s)
- Raghurama P Hegde
- Division of Biological Sciences, Poornaprajna Institute of Scientific Research, Bangalore, 560080, India
| | - Gowribidanur C Pavithra
- Division of Biological Sciences, Poornaprajna Institute of Scientific Research, Bangalore, 560080, India
- Manipal University, Manipal, 576104, India
| | - Debayan Dey
- Division of Biological Sciences, Poornaprajna Institute of Scientific Research, Bangalore, 560080, India
- Department of Physics, Indian Institute of Science, Bangalore, 560012, India
| | - Steven C Almo
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, 10461
- Department of Physiology & Biophysics, Albert Einstein College of Medicine, Bronx, New York, 10461
| | - S Ramakumar
- Department of Physics, Indian Institute of Science, Bangalore, 560012, India
| | - Udupi A Ramagopal
- Division of Biological Sciences, Poornaprajna Institute of Scientific Research, Bangalore, 560080, India
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, 10461
| |
Collapse
|
11
|
Altan I, Charbonneau P, Snell EH. Computational crystallization. Arch Biochem Biophys 2016; 602:12-20. [PMID: 26792536 DOI: 10.1016/j.abb.2016.01.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 12/22/2015] [Accepted: 01/07/2016] [Indexed: 11/28/2022]
Abstract
Crystallization is a key step in macromolecular structure determination by crystallography. While a robust theoretical treatment of the process is available, due to the complexity of the system, the experimental process is still largely one of trial and error. In this article, efforts in the field are discussed together with a theoretical underpinning using a solubility phase diagram. Prior knowledge has been used to develop tools that computationally predict the crystallization outcome and define mutational approaches that enhance the likelihood of crystallization. For the most part these tools are based on binary outcomes (crystal or no crystal), and the full information contained in an assembly of crystallization screening experiments is lost. The potential of this additional information is illustrated by examples where new biological knowledge can be obtained and where a target can be sub-categorized to predict which class of reagents provides the crystallization driving force. Computational analysis of crystallization requires complete and correctly formatted data. While massive crystallization screening efforts are under way, the data available from many of these studies are sparse. The potential for this data and the steps needed to realize this potential are discussed.
Collapse
Affiliation(s)
- Irem Altan
- Department of Chemistry, Duke University, Durham, NC 27708, USA
| | - Patrick Charbonneau
- Department of Chemistry, Duke University, Durham, NC 27708, USA; Department of Physics, Duke University, Durham, NC 27708, USA
| | - Edward H Snell
- Hauptman-Woodward Medical Research Institute, 700 Ellicott St., NY 14203, USA; Department of Structural Biology, SUNY University of Buffalo, 700 Ellicott St., NY 14203, USA.
| |
Collapse
|