1
|
Xu X, Xu C, He W, Wei L, Li H, Zhou J, Zhang R, Wang Y, Xiong Y, Gao X. HELM-GPT: de novo macrocyclic peptide design using generative pre-trained transformer. Bioinformatics 2024; 40:btae364. [PMID: 38867692 PMCID: PMC11256930 DOI: 10.1093/bioinformatics/btae364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/08/2024] [Accepted: 06/10/2024] [Indexed: 06/14/2024] Open
Abstract
MOTIVATION Macrocyclic peptides hold great promise as therapeutics targeting intracellular proteins. This stems from their remarkable ability to bind flat protein surfaces with high affinity and specificity while potentially traversing the cell membrane. Research has already explored their use in developing inhibitors for intracellular proteins, such as KRAS, a well-known driver in various cancers. However, computational approaches for de novo macrocyclic peptide design remain largely unexplored. RESULTS Here, we introduce HELM-GPT, a novel method that combines the strength of the hierarchical editing language for macromolecules (HELM) representation and generative pre-trained transformer (GPT) for de novo macrocyclic peptide design. Through reinforcement learning (RL), our experiments demonstrate that HELM-GPT has the ability to generate valid macrocyclic peptides and optimize their properties. Furthermore, we introduce a contrastive preference loss during the RL process, further enhanced the optimization performance. Finally, to co-optimize peptide permeability and KRAS binding affinity, we propose a step-by-step optimization strategy, demonstrating its effectiveness in generating molecules fulfilling both criteria. In conclusion, the HELM-GPT method can be used to identify novel macrocyclic peptides to target intracellular proteins. AVAILABILITY AND IMPLEMENTATION The code and data of HELM-GPT are freely available on GitHub (https://github.com/charlesxu90/helm-gpt).
Collapse
Affiliation(s)
- Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Chencheng Xu
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Wenjia He
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Lesong Wei
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Haoyang Li
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | | | - Yu Wang
- Syneron Technology, Guangzhou 510000, China
| | | | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| |
Collapse
|
2
|
Ochoa R, Brown JB, Fox T. pyPept: a python library to generate atomistic 2D and 3D representations of peptides. J Cheminform 2023; 15:79. [PMID: 37700347 PMCID: PMC10498622 DOI: 10.1186/s13321-023-00748-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 08/23/2023] [Indexed: 09/14/2023] Open
Abstract
We present pyPept, a set of executables and underlying python-language classes to easily create, manipulate, and analyze peptide molecules using the FASTA, HELM, or recently-developed BILN notations. The framework enables the analysis of both pure proteinogenic peptides as well as those with non-natural amino acids, including support to assemble a customizable monomer library, without requiring programming. From line notations, a peptide is transformed into a molecular graph for 2D depiction tasks, the calculation of physicochemical properties, and other systematic analyses or processing pipelines. The package includes a module to rapidly generate approximate peptide conformers by incorporating secondary structure restraints either given by the user or predicted via pyPept, and a wrapper tool is also provided to automate the generation and output of 2D and 3D representations of a peptide directly from the line notation. HELM and BILN notations that include circular, branched, or stapled peptides are fully supported, eliminating errors in structure creation that are prone during manual drawing and connecting. The framework and common workflows followed in pyPept are described together with illustrative examples. pyPept has been released at: https://github.com/Boehringer-Ingelheim/pyPept .
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397, Biberach/Riss, Germany
| | - J B Brown
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397, Biberach/Riss, Germany
| | - Thomas Fox
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397, Biberach/Riss, Germany.
| |
Collapse
|
3
|
Kanagalingam G, Schmitt S, Fleckenstein F, Stephan S. Data scheme and data format for transferable force fields for molecular simulation. Sci Data 2023; 10:495. [PMID: 37500652 PMCID: PMC10374650 DOI: 10.1038/s41597-023-02369-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 07/07/2023] [Indexed: 07/29/2023] Open
Abstract
A generalized data scheme for transferable classical force fields used in molecular simulations, i.e. molecular dynamics and Monte Carlo simulation, is presented. The data scheme is implemented in an SQL-based data format. The data scheme and data format is machine readable, re-usable, and interoperable. A transferable force field is a chemical construction plan specifying intermolecular and intramolecular interactions between different types of atoms or different chemical groups and can be used for building a model for a given component. The data scheme proposed in this work (named TUK-FFDat) formalizes digitally these chemical construction plans, i.e. transferable force fields. It can be applied to all-atom as well as united-atom transferable force fields. The general applicability of the data scheme is demonstrated for different types of force fields (TraPPE, OPLS-AA, and Potoff). Furthermore, conversion tools for translating the data scheme between .xls spread sheet format and the SQL-based data format are provided. The data format can readily be integrated in existing workflows, simulation engines, and force field databases as well as for linking such.
Collapse
Affiliation(s)
- Gajanan Kanagalingam
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, 67663, Germany
| | - Sebastian Schmitt
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, 67663, Germany
| | - Florian Fleckenstein
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, 67663, Germany
| | - Simon Stephan
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, 67663, Germany.
| |
Collapse
|
4
|
Lo S, Seifrid M, Gaudin T, Aspuru-Guzik A. Augmenting Polymer Datasets by Iterative Rearrangement. J Chem Inf Model 2023. [PMID: 37390494 DOI: 10.1021/acs.jcim.3c00144] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
One of the biggest obstacles to successful polymer property prediction is an effective representation that accurately captures the sequence of repeat units in a polymer. Motivated by the success of data augmentation in computer vision and natural language processing, we explore augmenting polymer data by iteratively rearranging the molecular representation while preserving the correct connectivity, revealing additional substructural information that is not present in a single representation. We evaluate the effects of this technique on the performance of machine learning models trained on three polymer datasets and compare them to common molecular representations. Data augmentation does not yield significant improvements in machine learning property prediction performance compared to equivalent (non-augmented) representations. In datasets where the target property is primarily influenced by the polymer sequence rather than experimental parameters, this data augmentation technique provides molecular embedding with more information to improve property prediction accuracy.
Collapse
Affiliation(s)
- Stanley Lo
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Martin Seifrid
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- IBM Research Zürich, Rüschlikon, Zürich 8803, Switzerland
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College St., Toronto, Ontario M5S 3E5, Canada
- Department of Materials Science and Engineering, University of Toronto, 184 College St., Toronto, Ontario M5S 3E4, Canada
- CIFAR Artificial Intelligence Research Chair, Vector Institute, Toronto, Ontario M5S 1M1, Canada
- Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|
5
|
Cheng T, Ono T, Shiota M, Yamada I, Aoki-Kinoshita KF, Bolton EE. Bridging glycoinformatics and cheminformatics: integration efforts between GlyCosmos and PubChem. Glycobiology 2023; 33:454-463. [PMID: 37129482 PMCID: PMC10284107 DOI: 10.1093/glycob/cwad028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 03/27/2023] [Accepted: 03/28/2023] [Indexed: 05/03/2023] Open
Abstract
The GlyCosmos Glycoscience Portal (https://glycosmos.org) and PubChem (https://pubchem.ncbi.nlm.nih.gov/) are major portals for glycoscience and chemistry, respectively. GlyCosmos is a portal for glycan-related repositories, including GlyTouCan, GlycoPOST, and UniCarb-DR, as well as for glycan-related data resources that have been integrated from a variety of 'omics databases. Glycogenes, glycoproteins, lectins, pathways, and disease information related to glycans are accessible from GlyCosmos. PubChem, on the other hand, is a chemistry-based portal at the National Center for Biotechnology Information. PubChem provides information not only on chemicals, but also genes, proteins, pathways, as well as patents, bioassays, and more, from hundreds of data resources from around the world. In this work, these 2 portals have made substantial efforts to integrate their complementary data to allow users to cross between these 2 domains. In addition to glycan structures, key information, such as glycan-related genes, relevant diseases, glycoproteins, and pathways, was integrated and cross-linked with one another. The interfaces were designed to enable users to easily find, access, download, and reuse data of interest across these resources. Use cases are described illustrating and highlighting the type of content that can be investigated. In total, these integrations provide life science researchers improved awareness and enhanced access to glycan-related information.
Collapse
Affiliation(s)
- Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Tamiko Ono
- Glycan and Life Systems Integration Center, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Masaaki Shiota
- Glycan and Life Systems Integration Center, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Issaku Yamada
- Laboratory of Glycoinformatics, The Noguchi Institute, 1-9-7 Kaga, Itabashi, Tokyo 173-0003, Japan
| | - Kiyoko F Aoki-Kinoshita
- Glycan and Life Systems Integration Center, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| |
Collapse
|
6
|
Li J, Yanagisawa K, Sugita M, Fujie T, Ohue M, Akiyama Y. CycPeptMPDB: A Comprehensive Database of Membrane Permeability of Cyclic Peptides. J Chem Inf Model 2023; 63:2240-2250. [PMID: 36930969 PMCID: PMC10091415 DOI: 10.1021/acs.jcim.2c01573] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
Abstract
Recently, cyclic peptides have been considered breakthrough drugs because they can interact with "undruggable" targets such as intracellular protein-protein interactions. Membrane permeability is an essential indicator of oral bioavailability and intracellular targeting, and the development of membrane-permeable peptides is a bottleneck in cyclic peptide drug discovery. Although many experimental data on membrane permeability of cyclic peptides have been reported, a comprehensive database is not yet available. A comprehensive membrane permeability database is essential for developing computational methods for cyclic peptide drug design. In this study, we constructed CycPeptMPDB, the first web-accessible database of cyclic peptide membrane permeability. We collected information on a total of 7334 cyclic peptides, including the structure and experimentally measured membrane permeability, from 45 published papers and 2 patents from pharmaceutical companies. To unambiguously represent cyclic peptides larger than small molecules, we used the hierarchical editing language for macromolecules notation to generate a uniform sequence representation of peptides. In addition to data storage, CycPeptMPDB provides several supporting functions such as online data visualization, data analysis, and downloading. CycPeptMPDB is expected to be a valuable platform to support membrane permeability research on cyclic peptides. CycPeptMPDB can be freely accessed at http://cycpeptmpdb.com.
Collapse
Affiliation(s)
- Jianan Li
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Keisuke Yanagisawa
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Masatake Sugita
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Takuya Fujie
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Yutaka Akiyama
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan.,Middle-Molecule IT-based Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| |
Collapse
|
7
|
Kunz RK, Rojnuckarin A, Schmidt CM, Miranda LP. Development of human-machine language interfaces for the visual analysis of complex biologics and RNA modalities and associated experimental data. AAPS OPEN 2023. [DOI: 10.1186/s41120-023-00073-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
AbstractThe advent of recombinant protein-based therapeutic agents in the 1980s and subsequent waves of innovation in molecular biology and engineering of biologics has permitted the production of an increasingly broad array of complex, high molecular weight constructs. While this has opened a powerful new toolbox of molecular scaffolds with which to probe and interdict biological processes, it also makes deciphering the architectural nuances between individual constructs intuitively difficult. Key to downstream data processes for the detection of data trends is the ability to unambiguously identify, compare, and communicate the nature of molecular compositions. Existing small molecule orientated software tools are not intended for structures such as peptides, proteins, antibodies, and RNA, and do not contain adequate atomistic or domain-level detail to appropriately convey their higher structural complexity. Similarly, there is a paucity of large molecule-focused data analysis and visualization tools. This article will describe four new approaches we developed for the graphical representation and analysis of complex large molecules and experimental data. These tools help fulfill key needs in scientific communication and structure-property analysis of complex biologics and modified oligonucleotide-based drug candidates.
Collapse
|
8
|
Farzan M, Ross A, Müller C, Allmendinger A. Liquid crystal phase formation and non-Newtonian behavior of oligonucleotide formulations. Eur J Pharm Biopharm 2022; 181:270-281. [PMID: 36435312 DOI: 10.1016/j.ejpb.2022.11.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/16/2022] [Accepted: 11/19/2022] [Indexed: 11/26/2022]
Abstract
Viscosity behavior of liquid oligonucleotide therapeutics and its dependence on formulation properties has been poorly studied to date. We observed a high increase in viscosity and solidification of therapeutic oligonucleotide formulations with increasing oligonucleotide concentration creating challenges during drug product manufacturing. In this study, we characterized the viscosity behavior of three different single strand DNA oligonucleotides based on oligonucleotide concentration and formulation composition. We subsequently studied the underlying mechanism for increased viscosity at higher oligonucleotide concentrations by dynamic light scattering (DLS), 1H nuclear magnetic resonance (NMR), differential scanning calorimetry (DSC), and polarized light microscopy. Viscosity was highly dependent on formulation composition, oligonucleotide sequence, and concentration, and especially dependent on the presence and combination of different individual ions, such as the presence of sodium chloride in the formulation. In samples with elevated viscosity, the viscosity behavior was characterized by non-Newtonian, shear-thinning flow behavior. We further studied these samples by DLS and 1H NMR, which revealed the presence of supra-molecular assemblies, and further characterization by polarized light and DSC characterized these assemblies as liquid crystals in the formulation. The present study links the macroscopic viscosity behavior of oligonucleotide formulations to the formation of supra-molecular assemblies and to the presence of liquid crystals, and highlights the importance of formulation composition selection for these therapeutics.
Collapse
Affiliation(s)
- Maryam Farzan
- Pharmaceutical Development & Supplies, Pharmaceutical Technical Development Biologics Europe, F. Hoffmann-La Roche, Grenzacherstr. 124, 4070 Basel, Switzerland.
| | - Alfred Ross
- Pharmaceutical Research and Early Development, F. Hoffmann-La Roche, Grenzacherstr. 124, 4070 Basel, Switzerland
| | - Claudia Müller
- Pharmaceutical Development & Supplies, Pharmaceutical Technical Development Biologics Europe, F. Hoffmann-La Roche, Grenzacherstr. 124, 4070 Basel, Switzerland
| | - Andrea Allmendinger
- Pharmaceutical Development & Supplies, Pharmaceutical Technical Development Biologics Europe, F. Hoffmann-La Roche, Grenzacherstr. 124, 4070 Basel, Switzerland; Pharmaceutical Technology and Biopharmacy, University of Freiburg, Sonnenstr. 5, 79104 Freiburg, Germany.
| |
Collapse
|
9
|
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM, Lameiro RF, Lemm D, Lo A, Moosavi SM, Nápoles-Duarte JM, Nigam A, Pollice R, Rajan K, Schatzschneider U, Schwaller P, Skreta M, Smit B, Strieth-Kalthoff F, Sun C, Tom G, Falk von Rudorff G, Wang A, White AD, Young A, Yu R, Aspuru-Guzik A. SELFIES and the future of molecular string representations. PATTERNS (NEW YORK, N.Y.) 2022; 3:100588. [PMID: 36277819 PMCID: PMC9583042 DOI: 10.1016/j.patter.2022.100588] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany,Corresponding author
| | - Qianxiang Ai
- Department of Chemistry, Fordham University, The Bronx, NY, USA
| | - Senja Barthel
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nessa Carson
- Syngenta Jealott’s Hill International Research Centre, Bracknell, Berkshire, UK
| | - Angelo Frei
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, UK
| | - Nathan C. Frey
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany,Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,IBM Research Europe, Zürich, Switzerland
| | | | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Rafael F. Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | - Alston Lo
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Seyed Mohamad Moosavi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | | | - AkshatKumar Nigam
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Robert Pollice
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Philippe Schwaller
- IBM Research Europe, Zürich, Switzerland,Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland,National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Felix Strieth-Kalthoff
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Chong Sun
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Gary Tom
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | | | - Andrew Wang
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada,Solar Fuels Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Adamo Young
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada,Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada,Department of Materials Science, University of Toronto, Toronto, ON, Canada,Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, Canada,Corresponding author
| |
Collapse
|
10
|
Fox T, Bieler M, Haebel P, Ochoa R, Peters S, Weber A. BILN: A Human-Readable Line Notation for Complex Peptides. J Chem Inf Model 2022; 62:3942-3947. [PMID: 35984937 DOI: 10.1021/acs.jcim.2c00703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We present an easy, human-readable line notation to describe even complex peptides.
Collapse
Affiliation(s)
- Thomas Fox
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Michael Bieler
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Peter Haebel
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Rodrigo Ochoa
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Stefan Peters
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Alexander Weber
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| |
Collapse
|
11
|
Guo M, Shou W, Makatura L, Erps T, Foshey M, Matusik W. Polygrammar: Grammar for Digital Polymer Representation and Generation. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2101864. [PMID: 35678650 PMCID: PMC9376847 DOI: 10.1002/advs.202101864] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 12/04/2021] [Indexed: 05/22/2023]
Abstract
Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, a parametric, context-sensitive grammar designed specifically for polymers (PolyGrammar) is proposed. Using the symbolic hypergraph representation and 14 simple production rules, PolyGrammar can represent and generate all valid polyurethane structures. An algorithm is presented to translate any polyurethane structure from the popular Simplified Molecular-Input Line-entry System (SMILES) string format into the PolyGrammar representation. The representative power of PolyGrammar is tested by translating a dataset of over 600 polyurethane samples collected from the literature. Furthermore, it is shown that PolyGrammar can be easily extended to other copolymers and homopolymers. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, PolyGrammar takes an essential step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.
Collapse
Affiliation(s)
- Minghao Guo
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
- CUHK Multimedia LabThe Chinese University of Hong KongSha TinHong Kong
| | - Wan Shou
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Liane Makatura
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Timothy Erps
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Michael Foshey
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Wojciech Matusik
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| |
Collapse
|
12
|
Baylon JL, Ursu O, Muzdalo A, Wassermann AM, Adams GL, Spale M, Mejzlik P, Gromek A, Pisarenko V, Hancharyk D, Jenkins E, Bednar D, Chang C, Clarova K, Glick M, Bitton DA. PepSeA: Peptide Sequence Alignment and Visualization Tools to Enable Lead Optimization. J Chem Inf Model 2022; 62:1259-1267. [PMID: 35192366 DOI: 10.1021/acs.jcim.1c01360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Therapeutic peptides offer potential advantages over small molecules in terms of selectivity, affinity, and their ability to target "undruggable" proteins that are associated with a wide range of pathologies. Despite their importance, current molecular design capabilities that inform medicinal chemistry decisions on peptide programs are limited. More specifically, there are unmet needs for structure-activity relationship (SAR) analysis and visualization of linear, cyclic, and cross-linked peptides containing non-natural motifs, which are widely used in drug discovery. To bridge this gap, we developed PepSeA (Peptide Sequence Alignment and Visualization), an open-source, freely available package of sequence-based tools (https://github.com/Merck/PepSeA). PepSeA enables multiple sequence alignment of non-natural amino acids and enhanced visualization with the hierarchical editing language for macromolecules (HELM). Via stepwise SAR analysis of a ChEMBL peptide data set, we demonstrate the utility of PepSeA to accelerate decision making in lead optimization campaigns in pharmaceutical setting. PepSeA represents an initial attempt to expand cheminformatics capabilities for therapeutic peptides and to enable rapid and more efficient design-make-test cycles.
Collapse
Affiliation(s)
- Javier L Baylon
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Oleg Ursu
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Anja Muzdalo
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Anne Mai Wassermann
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Gregory L Adams
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Martin Spale
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Petr Mejzlik
- AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Anna Gromek
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Viktor Pisarenko
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Dzianis Hancharyk
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Esteban Jenkins
- Foundational Data and Analytics, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - David Bednar
- Foundational Data and Analytics, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Charlie Chang
- Discovery Research IT, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Kamila Clarova
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic.,Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology, Prague 166 28, Czech Republic
| | - Meir Glick
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Danny A Bitton
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| |
Collapse
|
13
|
Mohapatra S, An J, Gómez-Bombarelli R. Chemistry-informed macromolecule graph representation for similarity computation, unsupervised and supervised learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac545e] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Abstract
The near-infinite chemical diversity of natural and artificial macromolecules arises from the vast range of possible component monomers, linkages, and polymers topologies. This enormous variety contributes to the ubiquity and indispensability of macromolecules but hinders the development of general machine learning methods with macromolecules as input. To address this, we developed a chemistry-informed graph representation of macromolecules that enables quantifying structural similarity, and interpretable supervised learning for macromolecules. Our work enables quantitative chemistry-informed decision-making and iterative design in the macromolecular chemical space.
Collapse
|
14
|
Wigh DS, Goodman JM, Lapkin AA. A review of molecular representation in the age of machine learning. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1603] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Daniel S. Wigh
- Department of Chemical Engineering and Biotechnology University of Cambridge Cambridge UK
| | | | - Alexei A. Lapkin
- Department of Chemical Engineering and Biotechnology University of Cambridge Cambridge UK
| |
Collapse
|
15
|
Medina-Franco JL, López-López E, Andrade E, Ruiz-Azuara L, Frei A, Guan D, Zuegg J, Blaskovich MA. Bridging informatics and medicinal inorganic chemistry: toward a database of metallodrugs and metallodrug candidates. Drug Discov Today 2022; 27:1420-1430. [DOI: 10.1016/j.drudis.2022.02.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/04/2021] [Accepted: 02/22/2022] [Indexed: 12/11/2022]
|
16
|
Sweet-Jones J, Ahmad M, Martin ACR. Antibody markup language (AbML) - a notation language for antibody-based drug formats and software for creating and rendering AbML (abYdraw). MAbs 2022; 14:2101183. [PMID: 35838549 PMCID: PMC9291709 DOI: 10.1080/19420862.2022.2101183] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
As interest in antibody-based drug development continues to increase, the biopharmaceutical industry has begun to focus on complex multi-specific antibodies (MsAbs) as an up-and-coming class of biologic that differ from natural monoclonal antibodies through their ability to bind to more than one type of antigen. As techniques to generate such molecules have diversified, so have their formats and the need for standard notation. Previous efforts to develop a notation language for macromolecule drugs have been insufficient, or too complex, for MsAbs. Here, we present Antibody Markup Language (AbML), a new notation language specifically for antibody formats that overcomes the limitations of existing languages and can annotate all current antibody formats, including fusions, fragments, standard antibodies and MsAbs, as well as all currently conceivable future formats. AbML V1.1 also provides explicit support for T-cell receptor domains. To assist users of this language we have also developed a tool, abYdraw, that can draw antibody schematics from AbML strings or generate an AbML string from a drawn antibody schematic. AbML has the potential to become a standardized notation for describing new MsAb formats entering clinical trials.Abbreviations: AbML: Antibody Markup Language; ADC: Antibody-drug conjugate; CAS: Chemical Abstracts Service; CH: Constant heavy; CL: Constant light; Fv: Variable fragment; HELM: Hierarchical Editing Language for Macromolecules; HSA: Human serum albumin; INN: International Nonproprietary Names; KIH: Knobs-into-holes; mAbs: Monoclonal antibodies; MsAb: Multi-specific antibody; WHO: World Health Organization; PEG: Poly-ethylene glycol; scFv: Single-chain variable fragment; SMILES: Simplified Molecular-Input Line-Entry System; VH: Variable heavy; VHH: Single-domain (Camelid) variable heavy; VL: Variable light.
Collapse
Affiliation(s)
- James Sweet-Jones
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK
| | - Maham Ahmad
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK
| | - Andrew C R Martin
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK
| |
Collapse
|
17
|
Clark AM, Gedeck P, Cheung PP, Bunin BA. Using Machine Learning to Parse Chemical Mixture Descriptions. ACS OMEGA 2021; 6:22400-22409. [PMID: 34497929 PMCID: PMC8412965 DOI: 10.1021/acsomega.1c03311] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 08/09/2021] [Indexed: 06/13/2023]
Abstract
Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that are readable only by trained scientists, and there are no accessible repositories of marked-up mixture data. We have designed a machine learning tool that can interpret mixture descriptions and upgrade them to the high-level Mixfile format, which can in turn be used to generate Mixtures InChI notation. The interpretation achieves a high success rate and can be used at scale to markup large catalogs and inventories, with some expert checking to catch edge cases. The training data that was accumulated during the project is made openly available, along with previously released mixture editing tools and utilities.
Collapse
|
18
|
Shin W, Hellerstein JL. Isolating structural errors in reaction networks in systems biology. Bioinformatics 2021; 37:388-395. [PMID: 32790862 PMCID: PMC8058775 DOI: 10.1093/bioinformatics/btaa720] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 07/10/2020] [Accepted: 08/07/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The growing complexity of reaction-based models necessitates early detection and resolution of model errors. Considerable work has been done on the detection of mass balance errors, especially atomic mass analysis (AMA) (which compares the counts of atoms in the reactants and products) and Linear Programming analysis (which detects stoichiometric inconsistencies). This article extends model error checking to include: (i) certain structural errors in reaction networks and (ii) error isolation. First, we consider the balance of chemical structures (moieties) between reactants and products. This balance is expected in many biochemical reactions, but the imbalance of chemical structures cannot be detected if the analysis is done in units of atomic masses. Second, we improve on error isolation for stoichiometric inconsistencies by identifying a small number of reactions and/or species that cause the error. Doing so simplifies error remediation. RESULTS We propose two algorithms that address isolating structural errors in reaction networks. Moiety analysis finds imbalances of moieties using the same algorithm as AMA, but moiety analysis works in units of moieties instead of atomic masses. We argue for the value of checking moiety balance, and discuss two approaches to decomposing chemical species into moieties. Graphical Analysis of Mass Equivalence Sets (GAMES) provides isolation for stoichiometric inconsistencies by constructing explanations that relate errors in the structure of the reaction network to elements of the reaction network. We study the effectiveness of moiety analysis and GAMES on curated models in the BioModels repository. We have created open source codes for moiety analysis and GAMES. AVAILABILITY AND IMPLEMENTATION Our project is hosted at https://github.com/ModelEngineering/SBMLLint, which contains examples, documentation, source code files and build scripts used to create SBMLLint. Our source code is licensed under the MIT open source license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Woosub Shin
- eScience Institute, University of Washington, Seattle, WA 98195-5061, USA.,Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | | |
Collapse
|
19
|
Upadhya R, Kosuri S, Tamasi M, Meyer TA, Atta S, Webb MA, Gormley AJ. Automation and data-driven design of polymer therapeutics. Adv Drug Deliv Rev 2021; 171:1-28. [PMID: 33242537 PMCID: PMC8127395 DOI: 10.1016/j.addr.2020.11.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/10/2020] [Accepted: 11/12/2020] [Indexed: 01/01/2023]
Abstract
Polymers are uniquely suited for drug delivery and biomaterial applications due to tunable structural parameters such as length, composition, architecture, and valency. To facilitate designs, researchers may explore combinatorial libraries in a high throughput fashion to correlate structure to function. However, traditional polymerization reactions including controlled living radical polymerization (CLRP) and ring-opening polymerization (ROP) require inert reaction conditions and extensive expertise to implement. With the advent of air-tolerance and automation, several polymerization techniques are now compatible with well plates and can be carried out at the benchtop, making high throughput synthesis and high throughput screening (HTS) possible. To avoid HTS pitfalls often described as "fishing expeditions," it is crucial to employ intelligent and big data approaches to maximize experimental efficiency. This is where the disruptive technologies of machine learning (ML) and artificial intelligence (AI) will likely play a role. In fact, ML and AI are already impacting small molecule drug discovery and showing signs of emerging in drug delivery. In this review, we present state-of-the-art research in drug delivery, gene delivery, antimicrobial polymers, and bioactive polymers alongside data-driven developments in drug design and organic synthesis. From this insight, important lessons are revealed for the polymer therapeutics community including the value of a closed loop design-build-test-learn workflow. This is an exciting time as researchers will gain the ability to fully explore the polymer structural landscape and establish quantitative structure-property relationships (QSPRs) with biological significance.
Collapse
Affiliation(s)
| | | | | | | | - Supriya Atta
- Rutgers, The State University of New Jersey, USA
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08540, USA
| | | |
Collapse
|
20
|
Ochoa R, Cossio P. PepFun: Open Source Protocols for Peptide-Related Computational Analysis. Molecules 2021; 26:molecules26061664. [PMID: 33809815 PMCID: PMC8002403 DOI: 10.3390/molecules26061664] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/05/2021] [Accepted: 03/15/2021] [Indexed: 11/27/2022] Open
Abstract
Peptide research has increased during the last years due to their applications as biomarkers, therapeutic alternatives or as antigenic sub-units in vaccines. The implementation of computational resources have facilitated the identification of novel sequences, the prediction of properties, and the modelling of structures. However, there is still a lack of open source protocols that enable their straightforward analysis. Here, we present PepFun, a compilation of bioinformatics and cheminformatics functionalities that are easy to implement and customize for studying peptides at different levels: sequence, structure and their interactions with proteins. PepFun enables calculating multiple characteristics for massive sets of peptide sequences, and obtaining different structural observables derived from protein-peptide complexes. In addition, random or guided library design of peptide sequences can be customized for screening campaigns. The package has been created under the python language based on built-in functions and methods available in the open source projects BioPython and RDKit. We present two tutorials where we tested peptide binders of the MHC class II and the Granzyme B protease.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, Medellin 050010, Colombia;
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, Medellin 050010, Colombia;
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60348 Frankfurt am Main, Germany
- Correspondence:
| |
Collapse
|
21
|
Minkiewicz P, Darewicz M, Iwaniak A, Turło M. Proposal of the Annotation of Phosphorylated Amino Acids and Peptides Using Biological and Chemical Codes. Molecules 2021; 26:molecules26030712. [PMID: 33573096 PMCID: PMC7866520 DOI: 10.3390/molecules26030712] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 01/21/2021] [Accepted: 01/26/2021] [Indexed: 01/04/2023] Open
Abstract
Phosphorylation represents one of the most important modifications of amino acids, peptides, and proteins. By modifying the latter, it is useful in improving the functional properties of foods. Although all these substances are broadly annotated in internet databases, there is no unified code for their annotation. The present publication aims to describe a simple code for the annotation of phosphopeptide sequences. The proposed code describes the location of phosphate residues in amino acid side chains (including new rules of atom numbering in amino acids) and the diversity of phosphate residues (e.g., di- and triphosphate residues and phosphate amidation). This article also includes translating the proposed biological code into SMILES, being the most commonly used chemical code. Finally, it discusses possible errors associated with applying the proposed code and in the resulting SMILES representations of phosphopeptides. The proposed code can be extended to describe other modifications in the future.
Collapse
|
22
|
Sciabola S, Xi H, Cruz D, Cao Q, Lawrence C, Zhang T, Rotstein S, Hughes JD, Caffrey DR, Stanton RV. PFRED: A computational platform for siRNA and antisense oligonucleotides design. PLoS One 2021; 16:e0238753. [PMID: 33481821 PMCID: PMC7822268 DOI: 10.1371/journal.pone.0238753] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 01/05/2021] [Indexed: 12/15/2022] Open
Abstract
PFRED a software application for the design, analysis, and visualization of antisense oligonucleotides and siRNA is described. The software provides an intuitive user-interface for scientists to design a library of siRNA or antisense oligonucleotides that target a specific gene of interest. Moreover, the tool facilitates the incorporation of various design criteria that have been shown to be important for stability and potency. PFRED has been made available as an open-source project so the code can be easily modified to address the future needs of the oligonucleotide research community. A compiled version is available for downloading at https://github.com/pfred/pfred-gui/releases/tag/v1.0 as a java Jar file. The source code and the links for downloading the precompiled version can be found at https://github.com/pfred.
Collapse
Affiliation(s)
- Simone Sciabola
- Medicinal Chemistry, Biogen, Cambridge, MA, United States of America
| | - Hualin Xi
- Rgenta, Cambridge, MA, United States of America
| | - Dario Cruz
- Medicinal Chemistry, Biogen, Cambridge, MA, United States of America
- Chemical Engineering, Northeastern University, Boston, MA, United States of America
| | - Qing Cao
- Medicinal Chemistry, Ra Pharmaceuticals, Cambridge, MA, United States of America
| | | | - Tianhong Zhang
- Business Technology, Pfizer, Cambridge, MA, United States of America
| | - Sergio Rotstein
- Business Technology, Pfizer, Cambridge, MA, United States of America
| | - Jason D. Hughes
- Computational Biology, Foundation Medicine, Cambridge, MA, United States of America
| | - Daniel R. Caffrey
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, United States of America
| | - Robert V. Stanton
- Simulation and Modeling Sciences, Pfizer, Cambridge, MA, United States of America
- * E-mail:
| |
Collapse
|
23
|
David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 2020; 12:56. [PMID: 33431035 PMCID: PMC7495975 DOI: 10.1186/s13321-020-00460-5] [Citation(s) in RCA: 150] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/05/2020] [Indexed: 02/08/2023] Open
Abstract
The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden.
| | - Amol Thakkar
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Rocío Mercado
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| |
Collapse
|
24
|
Lang PF, Chebaro Y, Zheng X, P Sekar JA, Shaikh B, Natale DA, Karr JR. BpForms and BcForms: a toolkit for concretely describing non-canonical polymers and complexes to facilitate global biochemical networks. Genome Biol 2020; 21:117. [PMID: 32423472 PMCID: PMC7236495 DOI: 10.1186/s13059-020-02025-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 04/16/2020] [Indexed: 12/12/2022] Open
Abstract
Non-canonical residues, caps, crosslinks, and nicks are important to many functions of DNAs, RNAs, proteins, and complexes. However, we do not fully understand how networks of such non-canonical macromolecules generate behavior. One barrier is our limited formats for describing macromolecules. To overcome this barrier, we develop BpForms and BcForms, a toolkit for representing the primary structure of macromolecules as combinations of residues, caps, crosslinks, and nicks. The toolkit can help omics researchers perform quality control and exchange information about macromolecules, help systems biologists assemble global models of cells that encompass processes such as post-translational modification, and help bioengineers design cells.
Collapse
Affiliation(s)
- Paul F Lang
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK
| | - Yassmine Chebaro
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Université de Strasbourg, Illkirch, 67404, France
| | - Xiaoyue Zheng
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
| | - John A P Sekar
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
| | - Bilal Shaikh
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, 20007, USA
| | - Jonathan R Karr
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.
| |
Collapse
|
25
|
Chernyshov IY, Toukach PV. REStLESS: automated translation of glycan sequences from residue-based notation to SMILES and atomic coordinates. Bioinformatics 2019; 34:2679-2681. [PMID: 29547883 DOI: 10.1093/bioinformatics/bty168] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 03/13/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Glycans and glycoconjugates are usually recorded in dedicated databases in residue-based notations. Only a few of them can be converted into chemical (atom-based) formats highly demanded in conformational and biochemical studies. In this work, we present a tool for translation from a residue-based glycan notation to SMILES. Results The REStLESS algorithm for translation from the CSDB Linear notation to SMILES was developed. REStLESS stands for ResiduEs as Smiles and LinkagEs as SmartS, where SMARTS reaction expressions are used to merge pre-encoded residues into a molecule. The implementation supports virtually all structural features reported in natural carbohydrates and glycoconjugates. The translator is equipped with a mechanism for conversion of SMILES strings into optimized atomic coordinates which can be used as starting geometries for various computational tasks. Availability and implementation REStLESS is integrated in the Carbohydrate Structure Database (CSDB) and is freely available on the web (http://csdb.glycoscience.ru/csdb2atoms.html). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ivan Yu Chernyshov
- All-Russia Research Institute of Agricultural Biotechnology, Laboratory of Plant Stress Tolerance, Russian Academy of Sciences, Moscow, Russia
| | - Philip V Toukach
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Laboratory of Complex and Nano-scaled Catalysts, Moscow, Russia
| |
Collapse
|
26
|
Lin TS, Coley CW, Mochigase H, Beech HK, Wang W, Wang Z, Woods E, Craig SL, Johnson JA, Kalow JA, Jensen KF, Olsen BD. BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules. ACS CENTRAL SCIENCE 2019; 5:1523-1531. [PMID: 31572779 PMCID: PMC6764162 DOI: 10.1021/acscentsci.9b00476] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Indexed: 05/21/2023]
Abstract
Having a compact yet robust structurally based identifier or representation system is a key enabling factor for efficient sharing and dissemination of research results within the chemistry community, and such systems lay down the essential foundations for future informatics and data-driven research. While substantial advances have been made for small molecules, the polymer community has struggled in coming up with an efficient representation system. This is because, unlike other disciplines in chemistry, the basic premise that each distinct chemical species corresponds to a well-defined chemical structure does not hold for polymers. Polymers are intrinsically stochastic molecules that are often ensembles with a distribution of chemical structures. This difficulty limits the applicability of all deterministic representations developed for small molecules. In this work, a new representation system that is capable of handling the stochastic nature of polymers is proposed. The new system is based on the popular "simplified molecular-input line-entry system" (SMILES), and it aims to provide representations that can be used as indexing identifiers for entries in polymer databases. As a pilot test, the entries of the standard data set of the glass transition temperature of linear polymers (Bicerano, 2002) were converted into the new BigSMILES language. Furthermore, it is hoped that the proposed system will provide a more effective language for communication within the polymer community and increase cohesion between the researchers within the community.
Collapse
Affiliation(s)
- Tzyy-Shyang Lin
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Connor W. Coley
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Hidenobu Mochigase
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Haley K. Beech
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Wencong Wang
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Zi Wang
- Department
of Chemistry, Duke University, Durham, North Carolina 27708, United States
| | - Eliot Woods
- Department
of Chemistry, Northwestern University, Evanston, Illinois 60208, United States
| | - Stephen L. Craig
- Department
of Chemistry, Duke University, Durham, North Carolina 27708, United States
| | - Jeremiah A. Johnson
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Julia A. Kalow
- Department
of Chemistry, Northwestern University, Evanston, Illinois 60208, United States
| | - Klavs F. Jensen
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering and Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- E-mail:
| |
Collapse
|
27
|
Ricart E, Leclère V, Flissi A, Mueller M, Pupin M, Lisacek F. rBAN: retro-biosynthetic analysis of nonribosomal peptides. J Cheminform 2019; 11:13. [PMID: 30737579 PMCID: PMC6689883 DOI: 10.1186/s13321-019-0335-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Accepted: 01/31/2019] [Indexed: 12/19/2022] Open
Abstract
Proteinogenic and non-proteinogenic amino acids, fatty acids or glycans are some of the main building blocks of nonribsosomal peptides (NRPs) and as such may give insight into the origin, biosynthesis and bioactivities of their constitutive peptides. Hence, the structural representation of NRPs using monomers provides a biologically interesting skeleton of these secondary metabolites. Databases dedicated to NRPs such as Norine, already integrate monomer-based annotations in order to facilitate the development of structural analysis tools. In this paper, we present rBAN (retro-biosynthetic analysis of nonribosomal peptides), a new computational tool designed to predict the monomeric graph of NRPs from their atomic structure in SMILES format. This prediction is achieved through the "in silico" fragmentation of a chemical structure and matching the resulting fragments against the monomers of Norine for identification. Structures containing monomers not yet recorded in Norine, are processed in a "discovery mode" that uses the RESTful service from PubChem to search the unidentified substructures and suggest new monomers. rBAN was integrated in a pipeline for the curation of Norine data in which it was used to check the correspondence between the monomeric graphs annotated in Norine and SMILES-predicted graphs. The process concluded with the validation of the 97.26% of the records in Norine, a two-fold extension of its SMILES data and the introduction of 11 new monomers suggested in the discovery mode. The accuracy, robustness and high-performance of rBAN were demonstrated in benchmarking it against other tools with the same functionality: Smiles2Monomers and GRAPE.
Collapse
Affiliation(s)
- Emma Ricart
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, 1211, Geneva, Switzerland. .,Computer Science Department, University of Geneva, Geneva, Switzerland.
| | - Valérie Leclère
- EA 7394-ICV- Institut Charles Viollette, University of Lille, INRA, ISA, University of Artois, Univ. Littoral Côte d'Opale, 59000, Lille, France
| | - Areski Flissi
- UMR 9189- CRIStAL- Centre de Recherche en Informatique Signal et Automatique de Lille, University of Lille, CNRS, Centrale Lille, 59000, Lille, France.,Bonsai Team, Inria-Lille Nord Europe, 9655, Villeneuve d'Ascq Cedex, France
| | - Markus Mueller
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Amphipole Building, Quartier Sorge, 1015, Lausanne, Switzerland
| | - Maude Pupin
- UMR 9189- CRIStAL- Centre de Recherche en Informatique Signal et Automatique de Lille, University of Lille, CNRS, Centrale Lille, 59000, Lille, France.,Bonsai Team, Inria-Lille Nord Europe, 9655, Villeneuve d'Ascq Cedex, France
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, 1211, Geneva, Switzerland.,Computer Science Department, University of Geneva, Geneva, Switzerland.,Section of Biology, University of Geneva, Geneva, Switzerland
| |
Collapse
|
28
|
van den Broek K, Daniel M, Epple M, Kuhn H, Schaub J, Zielesny A. SPICES: a particle-based molecular structure line notation and support library for mesoscopic simulation. J Cheminform 2018; 10:35. [PMID: 30094683 PMCID: PMC6085218 DOI: 10.1186/s13321-018-0294-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 08/03/2018] [Indexed: 12/18/2022] Open
Abstract
Simplified Particle Input ConnEction Specification (SPICES) is a particle-based molecular structure representation derived from straightforward simplifications of the atom-based SMILES line notation. It aims at supporting tedious and error-prone molecular structure definitions for particle-based mesoscopic simulation techniques like Dissipative Particle Dynamics by allowing for an interplay of different molecular encoding levels that range from topological line notations and corresponding particle-graph visualizations to 3D structures with support of their spatial mapping into a simulation box. An open Java library for SPICES structure handling and mesoscopic simulation support in combination with an open Java Graphical User Interface viewer application for visual topological inspection of SPICES definitions are provided.![]()
Collapse
Affiliation(s)
- Karina van den Broek
- Inorganic Chemistry and Center for Nanointegration Duisburg-Essen (CeNIDE), University of Duisburg-Essen, Essen, Germany.,Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany
| | - Mirco Daniel
- Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany
| | - Matthias Epple
- Inorganic Chemistry and Center for Nanointegration Duisburg-Essen (CeNIDE), University of Duisburg-Essen, Essen, Germany
| | | | - Jonas Schaub
- Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany
| | - Achim Zielesny
- Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany.
| |
Collapse
|
29
|
Chen H, Kogej T, Engkvist O. Cheminformatics in Drug Discovery, an Industrial Perspective. Mol Inform 2018; 37:e1800041. [PMID: 29774657 DOI: 10.1002/minf.201800041] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Accepted: 04/23/2018] [Indexed: 12/12/2022]
Abstract
Cheminformatics has established itself as a core discipline within large scale drug discovery operations. It would be impossible to handle the amount of data generated today in a small molecule drug discovery project without persons skilled in cheminformatics. In addition, due to increased emphasis on "Big Data", machine learning and artificial intelligence, not only in the society in general, but also in drug discovery, it is expected that the cheminformatics field will be even more important in the future. Traditional areas like virtual screening, library design and high-throughput screening analysis are highlighted in this review. Applying machine learning in drug discovery is an area that has become very important. Applications of machine learning in early drug discovery has been extended from predicting ADME properties and target activity to tasks like de novo molecular design and prediction of chemical reactions.
Collapse
Affiliation(s)
- Hongming Chen
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early, Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden
| | - Thierry Kogej
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early, Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early, Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden
| |
Collapse
|
30
|
Abstract
Following the elucidation of the human genome, chemogenomics emerged in the beginning of the twenty-first century as an interdisciplinary research field with the aim to accelerate target and drug discovery by making best usage of the genomic data and the data linkable to it. What started as a systematization approach within protein target families now encompasses all types of chemical compounds and gene products. A key objective of chemogenomics is the establishment, extension, analysis, and prediction of a comprehensive SAR matrix which by application will enable further systematization in drug discovery. Herein we outline future perspectives of chemogenomics including the extension to new molecular modalities, or the potential extension beyond the pharma to the agro and nutrition sectors, and the importance for environmental protection. The focus is on computational sciences with potential applications for compound library design, virtual screening, hit assessment, analysis of phenotypic screens, lead finding and optimization, and systems biology-based prediction of toxicology and translational research.
Collapse
Affiliation(s)
- Edgar Jacoby
- Janssen Research & Development, Beerse, Belgium.
| | - J B Brown
- Life Science Informatics Research Unit, Laboratory of Molecular Biosciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| |
Collapse
|
31
|
Minkiewicz P, Iwaniak A, Darewicz M. Annotation of Peptide Structures Using SMILES and Other Chemical Codes-Practical Solutions. Molecules 2017; 22:molecules22122075. [PMID: 29186902 PMCID: PMC6149970 DOI: 10.3390/molecules22122075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 11/15/2017] [Accepted: 11/25/2017] [Indexed: 12/20/2022] Open
Abstract
Contemporary peptide science exploits methods and tools of bioinformatics, and cheminformatics. These approaches use different languages to describe peptide structures—amino acid sequences and chemical codes (especially SMILES), respectively. The latter may be applied, e.g., in comparative studies involving structures and properties of peptides and peptidomimetics. Progress in peptide science “in silico” may be achieved via better communication between biologists and chemists, involving the translation of peptide representation from amino acid sequence into SMILES code. Recent recommendations concerning good practice in chemical information include careful verification of data and their annotation. This publication discusses the generation of SMILES representations of peptides using existing software. Construction of peptide structures containing unnatural and modified amino acids (with special attention paid on glycosylated peptides) is also included. Special attention is paid to the detection and correction of typical errors occurring in SMILES representations of peptides and their correction using molecular editors. Brief recommendations for training of staff working on peptide annotations, are discussed as well.
Collapse
Affiliation(s)
- Piotr Minkiewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, 10-726 Olsztyn-Kortowo, Poland.
| | - Anna Iwaniak
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, 10-726 Olsztyn-Kortowo, Poland.
| | - Małgorzata Darewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, 10-726 Olsztyn-Kortowo, Poland.
| |
Collapse
|
32
|
Milton J, Zhang T, Bellamy C, Swayze E, Hart C, Weisser M, Hecht S, Rotstein S. HELM Software for Biopolymers. J Chem Inf Model 2017; 57:1233-1239. [PMID: 28471655 DOI: 10.1021/acs.jcim.6b00442] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Hierarchical Editing Language for Macromolecules (HELM version 2.0) is a molecular line notation similar to SMILEs but specifically for communicating and managing biopolymer structures. The HELM project, part of the Pistoia Alliance nonprofit organization, has been tasked to develop and promote HELM as a global exchange format and recently released version 2.0 of the specification. Here we will describe the specifics of the HELM v2.0 notation along with the large ecosystem of software to support HELM-based structure management. We will highlight a recent open-source software and database for HELM monomers and a new, simpler approach to deploying a large complicated molecular management system.
Collapse
Affiliation(s)
- Jeff Milton
- Ionis Pharmaceuticals, Inc , 2855 Gazelle Court, Carlsbad, California 92010, United States
| | - Tianhong Zhang
- Pfizer Inc. , One Burtt Road, Andover, Massachusetts 01810, United States
| | - Claire Bellamy
- Pistoia Alliance , 401 Edgewater Place, Wakefield, Massachusetts 01880-6201, United States
| | - Eric Swayze
- Ionis Pharmaceuticals, Inc , 2855 Gazelle Court, Carlsbad, California 92010, United States
| | - Christopher Hart
- Ionis Pharmaceuticals, Inc , 2855 Gazelle Court, Carlsbad, California 92010, United States
| | - Markus Weisser
- Quattro Research , Fraunhoferstraße 18a 82152 Planegg-Martinsried, Germany
| | - Sabrina Hecht
- Quattro Research , Fraunhoferstraße 18a 82152 Planegg-Martinsried, Germany
| | - Sergio Rotstein
- Pfizer Inc. , One Burtt Road, Andover, Massachusetts 01810, United States
| |
Collapse
|
33
|
Southan C, Sharman JL, Benson HE, Faccenda E, Pawson AJ, Alexander SPH, Buneman OP, Davenport AP, McGrath JC, Peters JA, Spedding M, Catterall WA, Fabbro D, Davies JA. The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands. Nucleic Acids Res 2016; 44:D1054-68. [PMID: 26464438 PMCID: PMC4702778 DOI: 10.1093/nar/gkv1037] [Citation(s) in RCA: 987] [Impact Index Per Article: 123.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Revised: 09/25/2015] [Accepted: 09/29/2015] [Indexed: 01/05/2023] Open
Abstract
The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, http://www.guidetopharmacology.org) provides expert-curated molecular interactions between successful and potential drugs and their targets in the human genome. Developed by the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS), this resource, and its earlier incarnation as IUPHAR-DB, is described in our 2014 publication. This update incorporates changes over the intervening seven database releases. The unique model of content capture is based on established and new target class subcommittees collaborating with in-house curators. Most information comes from journal articles, but we now also index kinase cross-screening panels. Targets are specified by UniProtKB IDs. Small molecules are defined by PubChem Compound Identifiers (CIDs); ligand capture also includes peptides and clinical antibodies. We have extended the capture of ligands and targets linked via published quantitative binding data (e.g. Ki, IC50 or Kd). The resulting pharmacological relationship network now defines a data-supported druggable genome encompassing 7% of human proteins. The database also provides an expanded substrate for the biennially published compendium, the Concise Guide to PHARMACOLOGY. This article covers content increase, entity analysis, revised curation strategies, new website features and expanded download options.
Collapse
Affiliation(s)
- Christopher Southan
- Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Joanna L Sharman
- Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Helen E Benson
- Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Elena Faccenda
- Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Adam J Pawson
- Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Stephen P H Alexander
- School of Biomedical Sciences, University of Nottingham Medical School, Nottingham, NG7 2UH, UK
| | - O Peter Buneman
- Laboratory for Foundations of Computer Science, School of Informatics, University of Edinburgh, Edinburgh, EH8 9LE, UK
| | | | - John C McGrath
- School of Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK
| | - John A Peters
- Neuroscience Division, Medical Education Institute, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD1 9SY, UK
| | | | - William A Catterall
- Department of Pharmacology, University of Washington, Seattle, WA 98195-7280, USA
| | | | - Jamie A Davies
- Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK
| |
Collapse
|
34
|
Hansen MR, Villar HO, Feyfant E. Development of an Informatics Platform for Therapeutic Protein and Peptide Analytics. J Chem Inf Model 2013; 53:2774-9. [DOI: 10.1021/ci400333x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Mark R. Hansen
- Altoris, Inc., 7770 Regents Rd
#557, San Diego, California 92122, United States
| | - Hugo O. Villar
- Altoris, Inc., 7770 Regents Rd
#557, San Diego, California 92122, United States
| | - Eric Feyfant
- Aileron Therapeutics, 281 Albany
Street, Cambridge, Massachusetts 02139, United States
| |
Collapse
|