1
|
Islam S, Pantazes RJ. Developing similarity matrices for antibody-protein binding interactions. PLoS One 2023; 18:e0293606. [PMID: 37883504 PMCID: PMC10602319 DOI: 10.1371/journal.pone.0293606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
The inventions of AlphaFold and RoseTTAFold are revolutionizing computational protein science due to their abilities to reliably predict protein structures. Their unprecedented successes are due to the parallel consideration of several types of information, one of which is protein sequence similarity information. Sequence homology has been studied for many decades and depends on similarity matrices to define how similar or different protein sequences are to one another. A natural extension of predicting protein structures is predicting the interactions between proteins, but similarity matrices for protein-protein interactions do not exist. This study conducted a mutational analysis of 384 non-redundant antibody-protein antigen complexes to calculate antibody-protein interaction similarity matrices. Every important residue in each antibody and each antigen was mutated to each of the other 19 commonly occurring amino acids and the percentage changes in interaction energies were calculated using three force fields: CHARMM, Amber, and Rosetta. The data were used to construct six interaction similarity matrices, one for antibodies and another for antigens using each force field. The matrices exhibited both commonalities, such as mutations of aromatic and charged residues being the most detrimental, and differences, such as Rosetta predicting mutations of serines to be better tolerated than either Amber or CHARMM. A comparison to nine previously published similarity matrices for protein sequences revealed that the new interaction matrices are more similar to one another than they are to any of the previous matrices. The created similarity matrices can be used in force field specific applications to help guide decisions regarding mutations in protein-protein binding interfaces.
Collapse
Affiliation(s)
- Sumaiya Islam
- Department of Chemical Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Robert J. Pantazes
- Department of Chemical Engineering, Auburn University, Auburn, Alabama, United States of America
| |
Collapse
|
2
|
Kilinc M, Jia K, Jernigan RL. JSONWP: a static website generator for protein bioinformatics research. BIOINFORMATICS ADVANCES 2023; 3:vbad154. [PMID: 37904893 PMCID: PMC10613403 DOI: 10.1093/bioadv/vbad154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 10/24/2023] [Indexed: 11/01/2023]
Abstract
Motivation Presenting the integrated results of bioinformatics research can be challenging and requires sophisticated visualization components, which can be time-consuming to develop. This article presents a new way to effectively communicate research findings. Results We have developed a static web page generator, JSONWP, which is specifically designed for protein bioinformatics research. Utilizing React (a JavaScript library used to build interactive and dynamic user interfaces for web applications), we have integrated publicly available bioinformatics visualization components to provide standardized access to these components. JSON (or JavaScript Object Notation, is a lightweight textual data format often used to structure and exchange information between different software tools.) is used as the input source due to its ability to represent nearly all types of data using key and value pairs. This allows researchers to use their preferred programming language to create a JSON representation, which can then be converted into a website by JSONWP. No server or domain is required to host the website, as only the publicly accessible JSON file is required. Conclusions Overall, JSONWP provides a useful new tool for bioinformatics researchers to effectively communicate their findings. The open-source implementation is located at https://github.com/MesihK/react-json-wpbuilder, and the tool can be used at jsonwp.onrender.com.
Collapse
Affiliation(s)
- Mesih Kilinc
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, United States
| | - Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, United States
| | - Robert L Jernigan
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, United States
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, United States
| |
Collapse
|
3
|
Jia K, Kilinc M, Jernigan RL. New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions. FRONTIERS IN BIOINFORMATICS 2023; 3:1227193. [PMID: 37900964 PMCID: PMC10602800 DOI: 10.3389/fbinf.2023.1227193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 10/31/2023] Open
Abstract
Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.
Collapse
Affiliation(s)
- Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
| | - Mesih Kilinc
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Robert L. Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
4
|
Rademaker DT, van Geemen KJ, Xue LC. GradPose: a very fast and memory-efficient gradient descent-based tool for superimposing millions of protein structures from computational simulations. Bioinformatics 2023; 39:btad444. [PMID: 37471594 PMCID: PMC10397417 DOI: 10.1093/bioinformatics/btad444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 07/04/2023] [Accepted: 07/18/2023] [Indexed: 07/22/2023] Open
Abstract
SUMMARY Computational simulations like molecular dynamics and docking are providing crucial insights into the dynamics and interaction conformations of proteins, complementing experimental methods for determining protein structures. These methods often generate millions of protein conformations, necessitating highly efficient structure comparison and clustering methods to analyze the results. In this article, we introduce GradPose, a fast and memory-efficient structural superimposition tool for models generated by these large-scale simulations. GradPose uses gradient descent to optimally superimpose structures by optimizing rotation quaternions and can handle insertions and deletions compared to the reference structure. It is capable of superimposing thousands to millions of protein structures on standard hardware and utilizes multiple CPU cores and, if available, CUDA acceleration to further decrease superimposition time. Our results indicate that GradPose generally outperforms traditional methods, with a speed improvement of 2-65 times and memory requirement reduction of 1.7-48 times, with larger protein structures benefiting the most. We observed that traditional methods outperformed GradPose only with very small proteins consisting of ∼20 residues. The prerequisite of GradPose is that residue-residue correspondence is predetermined. With GradPose, we aim to provide a computationally efficient solution to the challenge of efficiently handling the demand for structural alignment in the computational simulation field. AVAILABILITY AND IMPLEMENTATION Source code is freely available at https://github.com/X-lab-3D/GradPose; doi:10.5281/zenodo.7671922.
Collapse
Affiliation(s)
- Daniel T Rademaker
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Kevin J van Geemen
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Li C Xue
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| |
Collapse
|
5
|
Caswell B, Summers TJ, Licup GL, Cantu DC. Mutation Space of Spatially Conserved Amino Acid Sites in Proteins. ACS OMEGA 2023; 8:24302-24310. [PMID: 37457482 PMCID: PMC10339398 DOI: 10.1021/acsomega.3c01473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 06/14/2023] [Indexed: 07/18/2023]
Abstract
The mutation space of spatially conserved (MSSC) amino acid residues is a protein structural quantity developed and described in this work. The MSSC quantifies how many mutations and which different mutations, i.e., the mutation space, occur in each amino acid site in a protein. The MSSC calculates the mutation space of amino acids in a target protein from the spatially conserved residues in a group of multiple protein structures. Spatially conserved amino acid residues are identified based on their relative positions in the protein structure. The MSSC examines each residue in a target protein, compares it to the residues present in the same relative position in other protein structures, and uses physicochemical criteria of mutations found in each conserved spatial site to quantify the mutation space of each amino acid in the target protein. The MSSC is analogous to scoring each site in a multiple sequence alignment but in three-dimensional space considering the spatial location of residues instead of solely the order in which they appear in a protein sequence. MSSC analysis was performed on example cases, and it reproduces the well-known observation that, regardless of secondary structure, solvent-exposed residues are more likely to be mutated than internal ones. The MSSC code is available on GitHub: "https://github.com/Cantu-Research-Group/Mutation_Space".
Collapse
|
6
|
Kilinc M, Jia K, Jernigan RL. Improved global protein homolog detection with major gains in function identification. Proc Natl Acad Sci U S A 2023; 120:e2211823120. [PMID: 36827259 PMCID: PMC9992864 DOI: 10.1073/pnas.2211823120] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 01/20/2023] [Indexed: 02/25/2023] Open
Abstract
There are several hundred million protein sequences, but the relationships among them are not fully available from existing homolog detection methods. There is an essential need for an improved method to push homolog detection to lower levels of sequence identity. The method used here relies on a language model to represent proteins numerically in a matrix (an embedding) and uses discrete cosine transforms to compress the data to extract the most essential part, significantly reducing the data size. This PRotein Ortholog Search Tool (PROST) is significantly faster with linear runtimes, and most importantly, computes the distances between pairs of protein sequences to yield homologs at significantly lower levels of sequence identity than previously. The extent of allosteric effects in proteins points out the importance of global aspects of structure and sequence. PROST excels at global homology detection but not at detecting local homologs. Results are validated by strong similarities between the corresponding pairs of structures. The number of remote homologs detected increased significantly and pushes the effective sequence matches more deeply into the twilight zone. Human protein sequences presently having no assigned function now find significant numbers of putative homologs for 93% of cases and structurally verified assigned functions for 76.4% of these cases. The data compression enables massive searches for homologs with short search times while yielding significant gains in the numbers of remote homologs detected. The method is sufficiently efficient to permit whole-genome/proteome comparisons. The PROST web server is accessible at https://mesihk.github.io/prost.
Collapse
Affiliation(s)
- Mesih Kilinc
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA50011
| | - Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA50011
| | - Robert L. Jernigan
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA50011
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA50011
| |
Collapse
|
7
|
Pearce R, Li Y, Omenn GS, Zhang Y. Fast and accurate Ab Initio Protein structure prediction using deep learning potentials. PLoS Comput Biol 2022; 18:e1010539. [PMID: 36112717 PMCID: PMC9518900 DOI: 10.1371/journal.pcbi.1010539] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 09/28/2022] [Accepted: 09/03/2022] [Indexed: 01/05/2023] Open
Abstract
Despite the immense progress recently witnessed in protein structure prediction, the modeling accuracy for proteins that lack sequence and/or structure homologs remains to be improved. We developed an open-source program, DeepFold, which integrates spatial restraints predicted by multi-task deep residual neural-networks along with a knowledge-based energy function to guide its gradient-descent folding simulations. The results on large-scale benchmark tests showed that DeepFold creates full-length models with accuracy significantly beyond classical folding approaches and other leading deep learning methods. Of particular interest is the modeling performance on the most difficult targets with very few homologous sequences, where DeepFold achieved an average TM-score that was 40.3% higher than trRosetta and 44.9% higher than DMPfold. Furthermore, the folding simulations for DeepFold were 262 times faster than traditional fragment assembly simulations. These results demonstrate the power of accurately predicted deep learning potentials to improve both the accuracy and speed of ab initio protein structure prediction.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Departments of Internal Medicine and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
8
|
Lubna S, Chinta S, Burra P, Vedantham K, Ray S, Bandyopadhyay D. New substitutions on NS1 protein from influenza A (H1N1) virus: Bioinformatics analyses of Indian strains isolated from 2009 to 2020. Health Sci Rep 2022; 5:e626. [PMID: 35509388 PMCID: PMC9059196 DOI: 10.1002/hsr2.626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 03/21/2022] [Accepted: 03/24/2022] [Indexed: 11/06/2022] Open
Affiliation(s)
- Syeda Lubna
- Birla Institute of Technology and Science, Pilani, Hyderabad Campus Hyderabad Telangana India
| | - Suma Chinta
- Birla Institute of Technology and Science, Pilani, Hyderabad Campus Hyderabad Telangana India
| | - Prakruthi Burra
- Birla Institute of Technology and Science, Pilani, Hyderabad Campus Hyderabad Telangana India
| | - Kiranmayi Vedantham
- Birla Institute of Technology and Science, Pilani, Hyderabad Campus Hyderabad Telangana India
| | | | - Debashree Bandyopadhyay
- Birla Institute of Technology and Science, Pilani, Hyderabad Campus Hyderabad Telangana India
| |
Collapse
|
9
|
Accurate prediction of immunoglobulin proteins using machine learning model. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100885] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
10
|
Identification of the Genome Segments of Bluetongue Virus Type 26/Type 1 Reassortants Influencing Horizontal Transmission in a Mouse Model. Viruses 2021; 13:v13112208. [PMID: 34835014 PMCID: PMC8620829 DOI: 10.3390/v13112208] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/26/2021] [Accepted: 10/28/2021] [Indexed: 01/20/2023] Open
Abstract
Bluetongue virus serotypes 1 to 24 are transmitted primarily by infected Culicoides midges, in which they also replicate. However, “atypical” BTV serotypes (BTV-25, -26, -27 and -28) have recently been identified that do not infect and replicate in adult Culicoides, or a Culicoides derived cell line (KC cells). These atypical viruses are transmitted horizontally by direct contact between infected and susceptible hosts (primarily small ruminants) causing only mild clinical signs, although the exact transmission mechanisms involved have yet to be determined. We used reverse genetics to generate a strain of BTV-1 (BTV-1 RGC7) which is less virulent, infecting IFNAR(−/−) mice without killing them. Reassortant viruses were also engineered, using the BTV-1 RGC7 genetic backbone, containing individual genome segments derived from BTV-26. These reassortant viruses were used to explore the genetic control of horizontal transmission (HT) in the IFNAR(−/−) mouse model. Previous studies showed that genome segments 1, 2 and 3 restrict infection of Culicoides cells, along with a minor role for segment 7. The current study demonstrates that genome segments 2, 5 and 10 of BTV-26 (coding for proteins VP2, NS1 and NS3/NS3a/NS5, respectively) are individually sufficient to promote HT.
Collapse
|