1
|
Marques R, Souza M, Batista F, Gonçalves M, Lavor C. A Probabilistic Approach in the Search Space of the Molecular Distance Geometry Problem. J Chem Inf Model 2025; 65:427-434. [PMID: 39536161 PMCID: PMC11733941 DOI: 10.1021/acs.jcim.4c00427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 10/16/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024]
Abstract
The discovery of the three-dimensional shape of protein molecules using interatomic distance information from nuclear magnetic resonance (NMR) can be modeled as a discretizable molecular distance geometry problem (DMDGP). Due to its combinatorial characteristics, the problem is conventionally solved in the literature as a depth-first search in a binary tree. In this work, we introduce a new search strategy, which we call frequency-based search (FBS), that for the first time utilizes geometric information contained in the protein data bank (PDB). We encode the geometric configurations of 14,382 molecules derived from NMR experiments present in the PDB into binary strings. The obtained results show that the sample space of the binary strings extracted from the PDB does not follow a uniform distribution. Furthermore, we compare the runtime of the symmetry-based build-Up (SBBU) algorithm (the most efficient method in the literature to solve the DMDGP) combined with FBS and the depth-first search (DFS) in finding a solution, ascertaining that FBS performs better in about 70% of the cases.
Collapse
Affiliation(s)
- Rômulo
S. Marques
- Instituto
de Matemática, Estatística e Computação
Científica, Universidade Estadual
de Campinas, Campinas 13083-859, Brazil
| | - Michael Souza
- Departamento
de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza 60020-181, Brazil
| | - Fernando Batista
- Departamento
de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza 60020-181, Brazil
| | - Miguel Gonçalves
- Departamento
de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza 60020-181, Brazil
| | - Carlile Lavor
- Instituto
de Matemática, Estatística e Computação
Científica, Universidade Estadual
de Campinas, Campinas 13083-859, Brazil
| |
Collapse
|
2
|
da Rocha W, Liberti L, Mucherino A, Malliavin TE. Influence of Stereochemistry in a Local Approach for Calculating Protein Conformations. J Chem Inf Model 2024; 64:8999-9008. [PMID: 39560315 DOI: 10.1021/acs.jcim.4c01232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2024]
Abstract
Protein structure prediction is generally based on the use of local conformational information coupled with long-range distance restraints. Such restraints can be derived from the knowledge of a template structure or the analysis of protein sequence alignment in the framework of models arising from the physics of disordered systems. The accuracy of approaches based on sequence alignment, however, is limited in the case where the number of aligned sequences is small. Here, we derive protein conformations using only local conformations knowledge by means of the interval Branch-and-Prune algorithm. The computation efficiency is directly related to the knowledge of stereochemistry (bond angle and ω values) along the protein sequence and, in particular, to the variations of the torsion angle ω. The impact of stereochemistry variations is particularly strong in the case of protein topologies defined from numerous long-range restraints, as in the case of protein of β secondary structures. The systematic enumeration of the conformations improves the efficiency of the calculations. The analysis of DNA codons permits to connect the variations of torsion angle ω to the positions of rare DNA codons.
Collapse
Affiliation(s)
- Wagner da Rocha
- LIX CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau 91128, France
| | - Leo Liberti
- LIX CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau 91128, France
| | | | - Thérèse E Malliavin
- LPCT, UMR 7019 Université de Lorraine CNRS, Vandoeuvre-lès-Nancy 54500, France
| |
Collapse
|
3
|
Vu MH, Robert PA, Akbar R, Swiatczak B, Sandve GK, Haug DTT, Greiff V. Linguistics-based formalization of the antibody language as a basis for antibody language models. NATURE COMPUTATIONAL SCIENCE 2024; 4:412-422. [PMID: 38877120 DOI: 10.1038/s43588-024-00642-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 05/13/2024] [Indexed: 06/16/2024]
Abstract
Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining.
Collapse
Affiliation(s)
- Mai Ha Vu
- Department of Linguistics and Scandinavian Studies, University of Oslo, Oslo, Norway.
| | - Philippe A Robert
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Rahmad Akbar
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Bartlomiej Swiatczak
- Department of History of Science and Scientific Archeology, University of Science and Technology of China, Hefei, China
| | | | | | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
4
|
Souza M, Maia N, Marques RS, Lavor C. A Branch-and-Bound Algorithm for the Molecular Ordered Covering Problem. J Comput Biol 2024; 31:475-485. [PMID: 38775777 DOI: 10.1089/cmb.2024.0522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024] Open
Abstract
The Discretizable Molecular Distance Geometry Problem (DMDGP) plays a key role in the construction of three-dimensional molecular structures from interatomic distances acquired through nuclear magnetic resonance (NMR) spectroscopy, with the primary objective of validating a sequence of distance constraints related to NMR data. This article addresses the escalating complexity of the DMDGP encountered with larger and more flexible molecules by introducing a novel strategy via the Molecular Ordered Covering Problem, which optimizes the ordering of distance constraints to improve computational efficiency in DMDGP resolution. This approach utilizes a specialized Branch-and-Bound (BB) algorithm, tested on both synthetic and actual protein structures from the protein data bank. Our analysis demonstrates the efficacy of the previously proposed greedy heuristic in managing complex molecular scenarios, highlighting the BB algorithm's utility as a validation mechanism. This research contributes to ongoing efforts in molecular structure analysis, with possible implications for areas such as protein folding, drug design, and molecular modeling.
Collapse
Affiliation(s)
- Michael Souza
- Departamento de Estatística e Matemática Aplicada, Universidade Federal do Ceará, Fortaleza, Brazil
| | - Nilton Maia
- Departamento de Estatística e Matemática Aplicada, Universidade Federal do Ceará, Fortaleza, Brazil
| | - Rômulo S Marques
- Departamento de Matemática Aplicada, Universidade Estadual de Campinas (IMECC-UNICAMP), Campinas, Brazil
| | - Carlile Lavor
- Departamento de Matemática Aplicada, Universidade Estadual de Campinas (IMECC-UNICAMP), Campinas, Brazil
| |
Collapse
|
5
|
Das NR, Chaudhury KN, Pal D. Improved NMR-data-compliant protein structure modeling captures context-dependent variations and expands the scope of functional inference. Proteins 2023; 91:412-435. [PMID: 36287124 DOI: 10.1002/prot.26439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 09/12/2022] [Accepted: 10/20/2022] [Indexed: 11/13/2022]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy can reveal conformational states of a protein in physiological conditions. However, sparsely available NMR data for a protein with large degrees of freedom can introduce structural artifacts in the built models. Currently used state-of-the-art methods deriving protein structure and conformation from NMR deploy molecular dynamics (MD) coupled with simulated annealing for building models. We provide an alternate graph-based modeling approach, where we first build substructures from NMR-derived distance-geometry constraints combined in one shot to form the core structure. The remaining molecule with inadequate data is modeled using a hybrid approach respecting the observed distance-geometry constraints. One-shot structure building is rarely undertaken for large and sparse data systems, but our data-driven bottom-up approach makes this uniquely feasible by suitable partitioning of the problem. A detailed comparison of select models with state-of-art methods reveals differences in the secondary structure regions wherein the correctness of our models is confirmed by NMR data. Benchmarking of 106 protein-folds covering 38-282 length structures shows minimal experimental-constraint violations while conforming to other structure quality parameters such as the proper folding, steric clash, and torsion angle violation based on Ramachandran plot criteria. Comparative MD studies using select protein models from a state-of-art method and ours under identical experimental parameters reveal distinct conformational dynamics that could be attributed to protein structure-function. Our work is thus useful in building enhanced NMR-evidence-based models that encapsulate the contextual secondary and tertiary structure variations present during the experimentation and expand the scope of functional inference.
Collapse
Affiliation(s)
- Niladri R Das
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, India.,Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
| | - Kunal N Chaudhury
- Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| |
Collapse
|
6
|
Förster D, Idier J, Liberti L, Mucherino A, Lin JH, Malliavin TE. Low-resolution description of the conformational space for intrinsically disordered proteins. Sci Rep 2022; 12:19057. [PMID: 36352011 PMCID: PMC9646904 DOI: 10.1038/s41598-022-21648-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 09/29/2022] [Indexed: 11/11/2022] Open
Abstract
Intrinsically disordered proteins (IDP) are at the center of numerous biological processes, and attract consequently extreme interest in structural biology. Numerous approaches have been developed for generating sets of IDP conformations verifying a given set of experimental measurements. We propose here to perform a systematic enumeration of protein conformations, carried out using the TAiBP approach based on distance geometry. This enumeration was performed on two proteins, Sic1 and pSic1, corresponding to unphosphorylated and phosphorylated states of an IDP. The relative populations of the obtained conformations were then obtained by fitting SAXS curves as well as Ramachandran probability maps, the original finite mixture approach RamaMix being developed for this second task. The similarity between profiles of local gyration radii provides to a certain extent a converged view of the Sic1 and pSic1 conformational space. Profiles and populations are thus proposed for describing IDP conformations. Different variations of the resulting gyration radius between phosphorylated and unphosphorylated states are observed, depending on the set of enumerated conformations as well as on the methods used for obtaining the populations.
Collapse
Affiliation(s)
- Daniel Förster
- grid.112485.b0000 0001 0217 6921UMR7374 Interfaces, Confinement, Matériaux et Nanostructures, Université d’Orléans, Orléans, France
| | - Jérôme Idier
- grid.503212.70000 0000 9563 6044UMR6004 Laboratoire des Sciences du Numérique de Nantes, Nantes, France
| | - Leo Liberti
- grid.508893.fLIX UMR 7161 CNRS École Polytechnique, Institut Polytechnique de Paris, 91128 Palaiseau, France
| | - Antonio Mucherino
- grid.420225.30000 0001 2298 7270IRISA, University of Rennes 1, Rennes, France
| | - Jung-Hsin Lin
- grid.509455.8Biomedical Translation Research Center, Academia Sinica, Taipei, Taiwan
| | - Thérèse E. Malliavin
- grid.428999.70000 0001 2353 6535Institut Pasteur, Université Paris Cité, CNRS UMR3528, Unité de Bioinformatique Structurale, F-75015 Paris, France ,grid.29172.3f0000 0001 2194 6418Université de Lorraine, CNRS UMR7019, LPCT, F-54000 Nancy, France
| |
Collapse
|
7
|
Labiak R, Lavor C, Souza M. Distance geometry and protein loop modeling. J Comput Chem 2021; 43:349-358. [PMID: 34904248 DOI: 10.1002/jcc.26796] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 10/22/2021] [Accepted: 11/28/2021] [Indexed: 11/11/2022]
Abstract
Due to the role of loops in protein function, loop modeling is an important problem in computational biology. We present a new approach to loop modeling based on a combinatorial version of distance geometry, where the search space of the associated problem is represented by a binary tree and a branch-and-prune method is defined to explore it, following an atomic ordering previously given. This ordering is used to calculate the coordinates of atoms from the positions of its predecessors. In addition to the theoretical development, computational results are presented to illustrate the advantage of the proposed method, compared with another approach of the literature. Our algorithm is freely available at https://github.com/michaelsouza/bpl.
Collapse
Affiliation(s)
- Rodrigo Labiak
- Department of Mathematics, University of Campinas, Campinas, Brazil
| | - Carlile Lavor
- Department of Applied Mathematics, University of Campinas, Campinas, Brazil
| | - Michael Souza
- Department of Applied Mathematics, Federal University of Ceara, Fortaleza, Brazil
| |
Collapse
|
8
|
Sanejouand YH. Normal-mode driven exploration of protein domain motions. J Comput Chem 2021; 42:2250-2257. [PMID: 34599620 DOI: 10.1002/jcc.26755] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/02/2021] [Accepted: 09/05/2021] [Indexed: 12/27/2022]
Abstract
Domain motions involved in the function of proteins can often be well described as a combination of motions along a handfull of low-frequency modes, that is, with the values of a few normal coordinates. This means that, when the functional motion of a protein is unknown, it should prove possible to predict it, since it amounts to guess a few values. However, without the help of additional experimental data, using normal coordinates for generating accurate conformers far away from the initial one is not so straightforward. To do so, a new approach is proposed: instead of building conformers directly with the values of a subset of normal coordinates, they are built in two steps, the conformer built with normal coordinates being just used for defining a set of distance constraints, the final conformer being built so as to match them. Note that this approach amounts to transform the problem of generating accurate protein conformers using normal coordinates into a better known one: the distance-geometry problem, which is herein solved with the help of the ROSETTA software. In the present study, this approach allowed to rebuild accurately six large amplitude conformational changes, using at most six low-frequency normal coordinates. As a consequence of the low-dimensionality of the corresponding subspace, random exploration also proved enough for generating low-energy conformers close to the known end-point of the conformational change of the LAO binding protein, lysozyme T4 and adenylate kinase.
Collapse
|
9
|
Robert PA, Arulraj T, Meyer-Hermann M. Ymir: A 3D structural affinity model for multi-epitope vaccine simulations. iScience 2021; 24:102979. [PMID: 34485861 PMCID: PMC8405928 DOI: 10.1016/j.isci.2021.102979] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 07/10/2021] [Accepted: 08/11/2021] [Indexed: 11/05/2022] Open
Abstract
Vaccine development is challenged by the hierarchy of immunodominance between target antigen epitopes and the emergence of antigenic variants by pathogen mutation. The strength and breadth of antibody responses relies on selection and mutation in the germinal center and on the structural similarity between antigens. Computational methods for assessing the breadth of germinal center responses to multivalent antigens are critical to speed up vaccine development. Yet, such methods have poorly reflected the 3D antigen structure and antibody breadth. Here, we present Ymir, a new 3D-lattice-based framework that calculates in silico antibody-antigen affinities. Key physiological properties naturally emerge from Ymir such as affinity jumps, cross-reactivity, and differential epitope accessibility. We validated Ymir by replicating known features of germinal center dynamics. We show that combining antigens with mutated but structurally related epitopes enhances vaccine breadth. Ymir opens a new avenue for understanding vaccine potency based on the structural relationship between vaccine antigens.
Collapse
Affiliation(s)
- Philippe A. Robert
- Department of Systems Immunology and Braunschweig Integrated Centre of Systems Biology, Helmholtz Centre for Infection Research, 38106 Braunschweig, Germany
| | - Theinmozhi Arulraj
- Department of Systems Immunology and Braunschweig Integrated Centre of Systems Biology, Helmholtz Centre for Infection Research, 38106 Braunschweig, Germany
| | - Michael Meyer-Hermann
- Department of Systems Immunology and Braunschweig Integrated Centre of Systems Biology, Helmholtz Centre for Infection Research, 38106 Braunschweig, Germany
- Institute for Biochemistry, Biotechnology and Bioinformatics, Technische Universität Braunschweig, Braunschweig, Germany
- Centre for Individualised Infection Medicine (CIIM), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, 30625 Hannover, Germany
| |
Collapse
|
10
|
Malliavin TE. Tandem domain structure determination based on a systematic enumeration of conformations. Sci Rep 2021; 11:16925. [PMID: 34413388 PMCID: PMC8376923 DOI: 10.1038/s41598-021-96370-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 08/04/2021] [Indexed: 12/03/2022] Open
Abstract
Protein structure determination is undergoing a change of perspective due to the larger importance taken in biology by the disordered regions of biomolecules. In such cases, the convergence criterion is more difficult to set up and the size of the conformational space is a obstacle to exhaustive exploration. A pipeline is proposed here to exhaustively sample protein conformations using backbone angle limits obtained by nuclear magnetic resonance (NMR), and then to determine the populations of conformations. The pipeline is applied to a tandem domain of the protein whirlin. An original approach, derived from a reformulation of the Distance Geometry Problem is used to enumerate the conformations of the linker connecting the two domains. Specifically designed procedure then permit to assemble the domains to the linker conformations and to optimize the tandem domain conformations with respect to two sets of NMR measurements: residual dipolar couplings and paramagnetic resonance enhancements. The relative populations of optimized conformations are finally determined by fitting small angle X-ray scattering (SAXS) data. The most populated conformation of the tandem domain is a semi-closed one, fully closed and more extended conformations being in minority, in agreement with previous observations. The SAXS and NMR data show different influences on the determination of populations.
Collapse
Affiliation(s)
- Thérèse E Malliavin
- Unité de Bioinformatique Structurale, Institut Pasteur, UMR 3528, CNRS, Paris, France.
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, USR 3756, CNRS, Paris, France.
| |
Collapse
|
11
|
Costa FLP, de Albuquerque ACF, Fiorot RG, Lião LM, Martorano LH, Mota GVS, Valverde AL, Carneiro JWM, dos Santos Junior FM. Structural characterisation of natural products by means of quantum chemical calculations of NMR parameters: new insights. Org Chem Front 2021. [DOI: 10.1039/d1qo00034a] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In this review, we focus in all aspects of NMR simulation of natural products, from the fundamentals to the new computational toolboxes available, combining advanced quantum chemical calculations with upstream data processing and machine learning.
Collapse
Affiliation(s)
| | - Ana C. F. de Albuquerque
- Departamento de Química Orgânica
- Instituto de Química
- Universidade Federal Fluminense
- Niterói-RJ
- Brazil
| | - Rodolfo G. Fiorot
- Departamento de Química Orgânica
- Instituto de Química
- Universidade Federal Fluminense
- Niterói-RJ
- Brazil
| | - Luciano M. Lião
- Instituto de Química
- Universidade Federal de Goiás
- 74690-900 Goiânia-GO
- Brazil
| | - Lucas H. Martorano
- Departamento de Química Orgânica
- Instituto de Química
- Universidade Federal Fluminense
- Niterói-RJ
- Brazil
| | - Gunar V. S. Mota
- Faculdade de Ciências Naturais/Instituto de Ciências Exatas e Naturais
- Universidade Federal do Pará
- Belém-PA
- Brazil
| | - Alessandra L. Valverde
- Departamento de Química Orgânica
- Instituto de Química
- Universidade Federal Fluminense
- Niterói-RJ
- Brazil
| | - José W. M. Carneiro
- Departamento de Química Inorgânica
- Instituto de Química
- Universidade Federal Fluminense
- Niterói-RJ
- Brazil
| | | |
Collapse
|