1
|
Kunz RK, Rojnuckarin A, Schmidt CM, Miranda LP. Development of human-machine language interfaces for the visual analysis of complex biologics and RNA modalities and associated experimental data. AAPS OPEN 2023. [DOI: 10.1186/s41120-023-00073-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
AbstractThe advent of recombinant protein-based therapeutic agents in the 1980s and subsequent waves of innovation in molecular biology and engineering of biologics has permitted the production of an increasingly broad array of complex, high molecular weight constructs. While this has opened a powerful new toolbox of molecular scaffolds with which to probe and interdict biological processes, it also makes deciphering the architectural nuances between individual constructs intuitively difficult. Key to downstream data processes for the detection of data trends is the ability to unambiguously identify, compare, and communicate the nature of molecular compositions. Existing small molecule orientated software tools are not intended for structures such as peptides, proteins, antibodies, and RNA, and do not contain adequate atomistic or domain-level detail to appropriately convey their higher structural complexity. Similarly, there is a paucity of large molecule-focused data analysis and visualization tools. This article will describe four new approaches we developed for the graphical representation and analysis of complex large molecules and experimental data. These tools help fulfill key needs in scientific communication and structure-property analysis of complex biologics and modified oligonucleotide-based drug candidates.
Collapse
|
2
|
Fox T, Bieler M, Haebel P, Ochoa R, Peters S, Weber A. BILN: A Human-Readable Line Notation for Complex Peptides. J Chem Inf Model 2022; 62:3942-3947. [PMID: 35984937 DOI: 10.1021/acs.jcim.2c00703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We present an easy, human-readable line notation to describe even complex peptides.
Collapse
Affiliation(s)
- Thomas Fox
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Michael Bieler
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Peter Haebel
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Rodrigo Ochoa
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Stefan Peters
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Alexander Weber
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| |
Collapse
|
3
|
David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 2020; 12:56. [PMID: 33431035 PMCID: PMC7495975 DOI: 10.1186/s13321-020-00460-5] [Citation(s) in RCA: 165] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/05/2020] [Indexed: 02/08/2023] Open
Abstract
The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden.
| | - Amol Thakkar
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Rocío Mercado
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| |
Collapse
|
4
|
Dong J, Zhu MF, Yun YH, Lu AP, Hou TJ, Cao DS. BioMedR: an R/CRAN package for integrated data analysis pipeline in biomedical study. Brief Bioinform 2019; 22:474-484. [PMID: 31885044 DOI: 10.1093/bib/bbz150] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 10/22/2019] [Accepted: 10/30/2019] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND With the increasing development of biotechnology and information technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these resources needs to be extracted and then transformed to useful knowledge by various data mining methods. However, a main computational challenge is how to effectively represent or encode molecular objects under investigation such as chemicals, proteins, DNAs and even complicated interactions when data mining methods are employed. To further explore these complicated data, an integrated toolkit to represent different types of molecular objects and support various data mining algorithms is urgently needed. RESULTS We developed a freely available R/CRAN package, called BioMedR, for molecular representations of chemicals, proteins, DNAs and pairwise samples of their interactions. The current version of BioMedR could calculate 293 molecular descriptors and 13 kinds of molecular fingerprints for small molecules, 9920 protein descriptors based on protein sequences and six types of generalized scale-based descriptors for proteochemometric modeling, more than 6000 DNA descriptors from nucleotide sequences and six types of interaction descriptors using three different combining strategies. Moreover, this package realized five similarity calculation methods and four powerful clustering algorithms as well as several useful auxiliary tools, which aims at building an integrated analysis pipeline for data acquisition, data checking, descriptor calculation and data modeling. CONCLUSION BioMedR provides a comprehensive and uniform R package to link up different representations of molecular objects with each other and will benefit cheminformatics/bioinformatics and other biomedical users. It is available at: https://CRAN.R-project.org/package=BioMedR and https://github.com/wind22zhu/BioMedR/.
Collapse
Affiliation(s)
- Jie Dong
- National Engineering Laboratory for Deep Processing of Rice and Byproducts, Hunan Key Laboratory of Processed Food for Special Medical Purpose, College of Food Science and Engineering, Central South University of Forestry and Technology, Changsha,410003 P. R. China.,Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003 P. R. China
| | - Min-Feng Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003 P. R. China
| | - Yong-Huan Yun
- College of Food Science and Engineering, Hainan University, Haikou, 570228 PR China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003 P. R. China.,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| |
Collapse
|
5
|
Chernyshov IY, Toukach PV. REStLESS: automated translation of glycan sequences from residue-based notation to SMILES and atomic coordinates. Bioinformatics 2019; 34:2679-2681. [PMID: 29547883 DOI: 10.1093/bioinformatics/bty168] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 03/13/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Glycans and glycoconjugates are usually recorded in dedicated databases in residue-based notations. Only a few of them can be converted into chemical (atom-based) formats highly demanded in conformational and biochemical studies. In this work, we present a tool for translation from a residue-based glycan notation to SMILES. Results The REStLESS algorithm for translation from the CSDB Linear notation to SMILES was developed. REStLESS stands for ResiduEs as Smiles and LinkagEs as SmartS, where SMARTS reaction expressions are used to merge pre-encoded residues into a molecule. The implementation supports virtually all structural features reported in natural carbohydrates and glycoconjugates. The translator is equipped with a mechanism for conversion of SMILES strings into optimized atomic coordinates which can be used as starting geometries for various computational tasks. Availability and implementation REStLESS is integrated in the Carbohydrate Structure Database (CSDB) and is freely available on the web (http://csdb.glycoscience.ru/csdb2atoms.html). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ivan Yu Chernyshov
- All-Russia Research Institute of Agricultural Biotechnology, Laboratory of Plant Stress Tolerance, Russian Academy of Sciences, Moscow, Russia
| | - Philip V Toukach
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Laboratory of Complex and Nano-scaled Catalysts, Moscow, Russia
| |
Collapse
|
6
|
Ricart E, Leclère V, Flissi A, Mueller M, Pupin M, Lisacek F. rBAN: retro-biosynthetic analysis of nonribosomal peptides. J Cheminform 2019; 11:13. [PMID: 30737579 PMCID: PMC6689883 DOI: 10.1186/s13321-019-0335-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Accepted: 01/31/2019] [Indexed: 12/19/2022] Open
Abstract
Proteinogenic and non-proteinogenic amino acids, fatty acids or glycans are some of the main building blocks of nonribsosomal peptides (NRPs) and as such may give insight into the origin, biosynthesis and bioactivities of their constitutive peptides. Hence, the structural representation of NRPs using monomers provides a biologically interesting skeleton of these secondary metabolites. Databases dedicated to NRPs such as Norine, already integrate monomer-based annotations in order to facilitate the development of structural analysis tools. In this paper, we present rBAN (retro-biosynthetic analysis of nonribosomal peptides), a new computational tool designed to predict the monomeric graph of NRPs from their atomic structure in SMILES format. This prediction is achieved through the "in silico" fragmentation of a chemical structure and matching the resulting fragments against the monomers of Norine for identification. Structures containing monomers not yet recorded in Norine, are processed in a "discovery mode" that uses the RESTful service from PubChem to search the unidentified substructures and suggest new monomers. rBAN was integrated in a pipeline for the curation of Norine data in which it was used to check the correspondence between the monomeric graphs annotated in Norine and SMILES-predicted graphs. The process concluded with the validation of the 97.26% of the records in Norine, a two-fold extension of its SMILES data and the introduction of 11 new monomers suggested in the discovery mode. The accuracy, robustness and high-performance of rBAN were demonstrated in benchmarking it against other tools with the same functionality: Smiles2Monomers and GRAPE.
Collapse
Affiliation(s)
- Emma Ricart
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, 1211, Geneva, Switzerland. .,Computer Science Department, University of Geneva, Geneva, Switzerland.
| | - Valérie Leclère
- EA 7394-ICV- Institut Charles Viollette, University of Lille, INRA, ISA, University of Artois, Univ. Littoral Côte d'Opale, 59000, Lille, France
| | - Areski Flissi
- UMR 9189- CRIStAL- Centre de Recherche en Informatique Signal et Automatique de Lille, University of Lille, CNRS, Centrale Lille, 59000, Lille, France.,Bonsai Team, Inria-Lille Nord Europe, 9655, Villeneuve d'Ascq Cedex, France
| | - Markus Mueller
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Amphipole Building, Quartier Sorge, 1015, Lausanne, Switzerland
| | - Maude Pupin
- UMR 9189- CRIStAL- Centre de Recherche en Informatique Signal et Automatique de Lille, University of Lille, CNRS, Centrale Lille, 59000, Lille, France.,Bonsai Team, Inria-Lille Nord Europe, 9655, Villeneuve d'Ascq Cedex, France
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, 1211, Geneva, Switzerland.,Computer Science Department, University of Geneva, Geneva, Switzerland.,Section of Biology, University of Geneva, Geneva, Switzerland
| |
Collapse
|
7
|
Warr WA. A Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility. Mol Inform 2014; 33:469-76. [DOI: 10.1002/minf.201400052] [Citation(s) in RCA: 85] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2014] [Accepted: 04/22/2014] [Indexed: 11/09/2022]
|
8
|
Hansen MR, Villar HO, Feyfant E. Development of an Informatics Platform for Therapeutic Protein and Peptide Analytics. J Chem Inf Model 2013; 53:2774-9. [DOI: 10.1021/ci400333x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Mark R. Hansen
- Altoris, Inc., 7770 Regents Rd
#557, San Diego, California 92122, United States
| | - Hugo O. Villar
- Altoris, Inc., 7770 Regents Rd
#557, San Diego, California 92122, United States
| | - Eric Feyfant
- Aileron Therapeutics, 281 Albany
Street, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
9
|
Zhang T, Li H, Xi H, Stanton RV, Rotstein SH. HELM: a hierarchical notation language for complex biomolecule structure representation. J Chem Inf Model 2012; 52:2796-806. [PMID: 22947017 DOI: 10.1021/ci3001925] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
When biological macromolecules are used as therapeutic agents, it is often necessary to introduce non-natural chemical modifications to improve their pharmaceutical properties. The final products are complex structures where entities such as proteins, peptides, oligonucleotides, and small molecule drugs may be covalently linked to each other, or may include chemically modified biological moieties. An accurate in silico representation of these complex structures is essential, as it forms the basis for their electronic registration, storage, analysis, and visualization. The size of these molecules (henceforth referred to as "biomolecules") often makes them too unwieldy and impractical to represent at the atomic level, while the presence of non-natural chemical modifications makes it impossible to represent them by sequence alone. Here we describe the Hierarchical Editing Language for Macromolecules ("HELM") and demonstrate its utility in the representation of structures such as antisense oligonucleotides, short interference RNAs, peptides, proteins, and antibody drug conjugates.
Collapse
Affiliation(s)
- Tianhong Zhang
- Pfizer Inc., 35 Cambridge Park Drive, Cambridge, Massachusetts 02140, USA.
| | | | | | | | | |
Collapse
|