1
|
Gelman S, Johnson B, Freschlin C, Sharma A, D'Costa S, Peters J, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
|
2
|
Hasan M, Ahmed S, Imranuzzaman M, Bari R, Roy S, Hasan MM, Mia MM. Designing and development of efficient multi-epitope-based peptide vaccine candidate against emerging avian rotavirus strains: A vaccinomic approach. J Genet Eng Biotechnol 2024; 22:100398. [PMID: 39179326 PMCID: PMC11260576 DOI: 10.1016/j.jgeb.2024.100398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/17/2024] [Accepted: 06/19/2024] [Indexed: 08/26/2024]
Abstract
BACKGROUND Enteric avian rotavirus (ARV) is the etiological agent of several health problems that pose a global threat to commercial chickens. Therefore, to avoid these widespread epidemics and high mortality rates, only vaccine and strict biosecurity are required. METHOD The present study employs computational techniques to design a unique multi-epitope-based vaccine candidate that successfully activates immune cells against the ARV by combining adjuvant, linker, and B and T-cell epitopes. Starting, homologous sequences in the various ARV serotypes were revealed in the NCBI BLAST database, and then the two surface proteins (VP4 and VP7) of the ARV were retrieved from the UniprotKB database. The Clustal Omega server was then used to identify the conserved regions among the homologous sequences, and the B and T-cell epitopes were predicted using IEDB servers. Then, superior epitopes-2 MHC-1 epitopes, 2 MHC-2 epitopes, and 3B-cell epitopes-were combined with various adjuvants to create a total of four unique vaccine candidates. Afterward, the designed vaccine candidates underwent computational validation to assess their antigenicity, allergenicity, and stability. The vaccine candidate (V2) that demonstrated non-antigenicity, a high VaxiJen score, and non-allergenicity was ultimately chosen for molecular docking and dynamic simulation. RESULTS Although the V2 and V4 vaccine candidates were highly immunogenic, V2 had a higher solubility rate. The predicted values of the aliphatic index and GRAVY value were 30.4 and 0.417, respectively. In terms of binding energy, V2 outperformed V4. Being successfully docked with TLRs, V2 was praised as the finest. After adaptation, the sequence's 50.73 % GC content outside of the BglII or ApaI restriction sites indicated that it was equivalently safe to clone. The chosen sequence was then inserted into the pET28a(+) vector within the BglII and ApaI restriction sites. This resulted in a final clone that was 4914 base pairs long, with the inserted sequence accounting for 478 bp and the vector accounting for the remainder. CONCLUSIONS The immune-mediated simulation results for the selected vaccine construct showed significant response; thus, the study confirmed that the selected V2 vaccine candidate could enhance the immune response against ARV.
Collapse
Affiliation(s)
- Mahamudul Hasan
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh.
| | - Shakil Ahmed
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh.
| | - Md Imranuzzaman
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh; Department of Pharmacology and Toxicology, Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh
| | - Rezaul Bari
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh
| | - Shiplu Roy
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh; Department of Livestock Production and Management, Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh
| | - Md Mahadi Hasan
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh
| | - Md Mukthar Mia
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh; Department of Poultry Science, Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet-3100, Bangladesh
| |
Collapse
|
3
|
Patkar SS, Wang B, Mosquera AM, Kiick KL. Genetically Fusing Order-Promoting and Thermoresponsive Building Blocks to Design Hybrid Biomaterials. Chemistry 2024; 30:e202400582. [PMID: 38501912 PMCID: PMC11661552 DOI: 10.1002/chem.202400582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 03/18/2024] [Accepted: 03/19/2024] [Indexed: 03/20/2024]
Abstract
The unique biophysical and biochemical properties of intrinsically disordered proteins (IDPs) and their recombinant derivatives, intrinsically disordered protein polymers (IDPPs) offer opportunities for producing multistimuli-responsive materials; their sequence-encoded disorder and tendency for phase separation facilitate the development of multifunctional materials. This review highlights the strategies for enhancing the structural diversity of elastin-like polypeptides (ELPs) and resilin-like polypeptides (RLPs), and their self-assembled structures via genetic fusion to ordered motifs such as helical or beta sheet domains. In particular, this review describes approaches that harness the synergistic interplay between order-promoting and thermoresponsive building blocks to design hybrid biomaterials, resulting in well-structured, stimuli-responsive supramolecular materials ordered on the nanoscale.
Collapse
Affiliation(s)
- Sai S Patkar
- Department of Materials Science and Engineering, University of Delaware, Newark, Delaware, 19716, United States
- Eli Lilly and Company, 450 Kendall Street, Cambridge, MA, 02142, United States
| | - Bin Wang
- Department of Materials Science and Engineering, University of Delaware, Newark, Delaware, 19716, United States
| | - Ana Maria Mosquera
- Department of Materials Science and Engineering, University of Delaware, Newark, Delaware, 19716, United States
| | - Kristi L Kiick
- Department of Materials Science and Engineering, University of Delaware, Newark, Delaware, 19716, United States
- Department of Biomedical Engineering, University of Delaware, Newark, Delaware, 19716, United States
| |
Collapse
|
4
|
Rahman MN, Ahmed S, Hasan M, Shuvo MSA, Islam MA, Hasan R, Roy S, Hossain H, Mia MM. Immunoselective progression of a multi-epitope-based subunit vaccine candidate to convey protection against the parasite Onchocerca lupi. INFORMATICS IN MEDICINE UNLOCKED 2023. [DOI: 10.1016/j.imu.2023.101209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023] Open
|
5
|
Hasan M, Mia M. Exploratory Algorithm of a Multi-epitope-based Subunit Vaccine Candidate Against Cryptosporidium hominis: Reverse Vaccinology-Based Immunoinformatic Approach. Int J Pept Res Ther 2022; 28:134. [PMID: 35911179 PMCID: PMC9315849 DOI: 10.1007/s10989-022-10438-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/26/2022] [Indexed: 12/03/2022]
Abstract
Cryptosporidiosis is the leading protozoan-induced cause of diarrheal illness in children, and it has been linked to childhood mortality, malnutrition, cognitive development, with retardation of growth. Cryptosporidium hominis, the anthroponotically transmitted species within the Cryptosporidium genus, contributes significantly to the global burden of infection, accounting for the majority of clinical cases in numerous nations, as well as its emergence in the last decade is largely due to detections obtained through noteworthy epidemiologic research. Nevertheless, there is no vaccine available, and the only licensed medication, nitazoxanide, has been demonstrated to have efficacy limitations in a number of patient groups recognized to be at high risk of complications. Therefore, current study delineates the computational vaccine design for Cryptosporidium hominis, the notable pathogen for enteric diarrhea. Firstly, a comprehensive literature search was conducted to identify six proteins based on their toxigenicity, allergenicity, antigenicity, and prediction of transmembrane helices to make up a multi-epitope-based subunit vaccine. Following that, antigenic non-toxic HTL epitope, CTL epitope with B cell epitope were predicted from the selected proteins and construct a vaccine candidate with adding an adjuvant and some linkers with immunologically superior epitopes. Afterwards, the constructed vaccine candidates and TLR2 receptor were put into the ClusPro server for molecular dynamic simulation to know the binding stability of the vaccine-TLR2 complex. Following that, Escherichia coli strain K12 was used as a cloning host for the chosen vaccine construct via the JCat server. As a result of the findings, it was resolute that the proposed chimeric peptide vaccine could improve the immune response to Cryptosporidium hominis.
Collapse
Affiliation(s)
- Mahamudul Hasan
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet, 3100 Bangladesh
| | - Mukthar Mia
- Department of Poultry Science, Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet, 3100 Bangladesh.,Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet, 3100 Bangladesh
| |
Collapse
|
6
|
Liu J, Zhao KL, He GX, Wang LJ, Zhou XG, Zhang GJ. A de novo protein structure prediction by iterative partition sampling, topology adjustment and residue-level distance deviation optimization. Bioinformatics 2021; 38:99-107. [PMID: 34459867 DOI: 10.1093/bioinformatics/btab620] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 07/23/2021] [Accepted: 08/25/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the great progress of deep learning-based inter-residue contact/distance prediction, the discrete space formed by fragment assembly cannot satisfy the distance constraint well. Thus, the optimal solution of the continuous space may not be achieved. Designing an effective closed-loop continuous dihedral angle optimization strategy that complements the discrete fragment assembly is crucial to improve the performance of the distance-assisted fragment assembly method. RESULTS In this article, we proposed a de novo protein structure prediction method called IPTDFold based on closed-loop iterative partition sampling, topology adjustment and residue-level distance deviation optimization. First, local dihedral angle crossover and mutation operators are designed to explore the conformational space extensively and achieve information exchange between the conformations in the population. Then, the dihedral angle rotation model of loop region with partial inter-residue distance constraints is constructed, and the rotation angle satisfying the constraints is obtained by differential evolution algorithm, so as to adjust the spatial position relationship between the secondary structures. Finally, the residue distance deviation is evaluated according to the difference between the conformation and the predicted distance, and the dihedral angle of the residue is optimized with biased probability. The final model is generated by iterating the above three steps. IPTDFold is tested on 462 benchmark proteins, 24 FM targets of CASP13 and 20 FM targets of CASP14. Results show that IPTDFold is significantly superior to the distance-assisted fragment assembly method Rosetta_D (Rosetta with distance). In particular, the prediction accuracy of IPTDFold does not decrease as the length of the protein increases. When using the same FastRelax protocol, the prediction accuracy of IPTDFold is significantly superior to that of trRosetta without orientation constraints, and is equivalent to that of the full version of trRosetta. AVAILABILITYAND IMPLEMENTATION The source code and executable are freely available at https://github.com/iobio-zjut/IPTDFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guang-Xing He
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Liu-Jing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
7
|
Hou M, Peng C, Zhou X, Zhang B, Zhang G. Multi contact-based folding method for de novo protein structure prediction. Brief Bioinform 2021; 23:6445108. [PMID: 34849573 DOI: 10.1093/bib/bbab463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/21/2021] [Accepted: 10/10/2021] [Indexed: 11/12/2022] Open
Abstract
Meta contact, which combines different contact maps into one to improve contact prediction accuracy and effectively reduce the noise from a single contact map, is a widely used method. However, protein structure prediction using meta contact cannot fully exploit the information carried by original contact maps. In this work, a multi contact-based folding method under the evolutionary algorithm framework, MultiCFold, is proposed. In MultiCFold, the thorough information of different contact maps is directly used by populations to guide protein structure folding. In addition, noncontact is considered as an effective supplement to contact information and can further assist protein folding. MultiCFold is tested on a set of 120 nonredundant proteins, and the average TM-score and average RMSD reach 0.617 and 5.815 Å, respectively. Compared with the meta contact-based method, MetaCFold, average TM-score and average RMSD have a 6.62 and 8.82% improvement. In particular, the import of noncontact information increases the average TM-score by 6.30%. Furthermore, MultiCFold is compared with four state-of-the-art methods of CASP13 on the 24 FM targets, and results show that MultiCFold is significantly better than other methods after the full-atom relax procedure.
Collapse
Affiliation(s)
- Minghua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Chunxiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Hangzhou 310023, China
| | - Biao Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
8
|
Munro LJ, Kell DB. Intelligent host engineering for metabolic flux optimisation in biotechnology. Biochem J 2021; 478:3685-3721. [PMID: 34673920 PMCID: PMC8589332 DOI: 10.1042/bcj20210535] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 09/22/2021] [Accepted: 09/24/2021] [Indexed: 12/13/2022]
Abstract
Optimising the function of a protein of length N amino acids by directed evolution involves navigating a 'search space' of possible sequences of some 20N. Optimising the expression levels of P proteins that materially affect host performance, each of which might also take 20 (logarithmically spaced) values, implies a similar search space of 20P. In this combinatorial sense, then, the problems of directed protein evolution and of host engineering are broadly equivalent. In practice, however, they have different means for avoiding the inevitable difficulties of implementation. The spare capacity exhibited in metabolic networks implies that host engineering may admit substantial increases in flux to targets of interest. Thus, we rehearse the relevant issues for those wishing to understand and exploit those modern genome-wide host engineering tools and thinking that have been designed and developed to optimise fluxes towards desirable products in biotechnological processes, with a focus on microbial systems. The aim throughput is 'making such biology predictable'. Strategies have been aimed at both transcription and translation, especially for regulatory processes that can affect multiple targets. However, because there is a limit on how much protein a cell can produce, increasing kcat in selected targets may be a better strategy than increasing protein expression levels for optimal host engineering.
Collapse
Affiliation(s)
- Lachlan J. Munro
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Douglas B. Kell
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
- Mellizyme Biotechnology Ltd, IC1, Liverpool Science Park, 131 Mount Pleasant, Liverpool L3 5TF, U.K
| |
Collapse
|
9
|
Lara Ortiz MT, Martinell García V, Del Rio G. Saturation Mutagenesis of the Transmembrane Region of HokC in Escherichia coli Reveals Its High Tolerance to Mutations. Int J Mol Sci 2021; 22:ijms221910359. [PMID: 34638709 PMCID: PMC8509063 DOI: 10.3390/ijms221910359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
Cells adapt to different stress conditions, such as the antibiotics presence. This adaptation sometimes is achieved by changing relevant protein positions, of which the mutability is limited by structural constrains. Understanding the basis of these constrains represent an important challenge for both basic science and potential biotechnological applications. To study these constraints, we performed a systematic saturation mutagenesis of the transmembrane region of HokC, a toxin used by Escherichia coli to control its own population, and observed that 92% of single-point mutations are tolerated and that all the non-tolerated mutations have compensatory mutations that reverse their effect. We provide experimental evidence that HokC accumulates multiple compensatory mutations that are found as correlated mutations in the HokC family multiple sequence alignment. In agreement with these observations, transmembrane proteins show higher probability to present correlated mutations and are less densely packed locally than globular proteins; previous mutagenesis results on transmembrane proteins further support our observations on the high tolerability to mutations of transmembrane regions of proteins. Thus, our experimental results reveal the HokC transmembrane region high tolerance to loss-of-function mutations that is associated with low sequence conservation and high rate of correlated mutations in the HokC family sequences alignment, which are features shared with other transmembrane proteins.
Collapse
|
10
|
Mia MM, Hasan M, Hasan MM, Khan SS, Rahman MN, Ahmed S, Basak A, Sakib MN, Banik S. Multi-epitope based subunit vaccine construction against Banna virus targeting on two outer proteins (VP4 and VP9): A computational approach. INFECTION GENETICS AND EVOLUTION 2021; 95:105076. [PMID: 34500093 DOI: 10.1016/j.meegid.2021.105076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 08/28/2021] [Accepted: 09/03/2021] [Indexed: 11/17/2022]
Abstract
Recently, RNA viruses have gained a mammoth concern for causing various outbreaks, and due to pandemics, they are acquiring additional attention throughout the world. An emerging RNA as well as vector-borne Banna Virus (BAV) is a human pathogen resulting in encephalitis, fever, headache, muscle aches, and severe coma. Besides human, pathogenic BAV was also detected from pigs, cattle, ticks, midges, and mosquitoes in Indonesia, China, and Vietnam. Due to high mutation tendency and dearth of a species barrier, this virus will consider as a significant threat in the near future throughout the planet, particularly in Africa. Despite of severe human case fatalities in several countries, there are no specific therapeutics, available vaccines, and other preventive measures against BAV. Thus, to find out the effective therapeutics and preventive strategies are crying exigency. In the present study, a unique multi-epitope-based peptide vaccine candidate is constructed using bioinformatics' tools that efficiently instigate immune cells for generating BAV antibodies. The potential vaccine candidates were developed using both T and B -cell epitopes. UniprotKB database was used to retrieve of two outer proteins (VP9 and VP4), and homologous sequences of BAV taxid: 7763, 649,604, 77,763, and 8453 were searched by NCBI BLAST. These serotypes are the most closely associated with the disease. Then combining the best-selected epitopes in various combinations with different adjuvants, three distinct vaccine candidates were formed. The validity tests were performed for the screened vaccine candidate regarding stability, allergenicity, and antigenicity parameters. Moreover, molecular dynamic simulations of the selected vaccine with TLR-8 immune receptor confirmed the stability of the binding pose and showed a significant response to immune cells. Thus, the results established that the designed chimeric peptide vaccine could enhance the immune response against BAV.
Collapse
Affiliation(s)
- Md Mukthar Mia
- Department of Poultry Science, Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh; Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh..
| | - Mahamudul Hasan
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh..
| | - Md Mahadi Hasan
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Sumaya Shargin Khan
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Mohammad Nahian Rahman
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Shakil Ahmed
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Ankita Basak
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Md Nazmuj Sakib
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Shrabonti Banik
- Faculty of Veterinary, Animal and Biomedical Sciences, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| |
Collapse
|
11
|
Foroutan B, Abbasian Najafabadi AR. Capabilities of bioinformatics tools for optimizing physicochemical features of proteins used in Nano biosensors: A short overview of the tools related to bioinformatics. Biochem Biophys Rep 2021; 27:101094. [PMID: 34401530 PMCID: PMC8350186 DOI: 10.1016/j.bbrep.2021.101094] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 07/29/2021] [Accepted: 07/30/2021] [Indexed: 12/27/2022] Open
Abstract
Protein-protein ligand is one of the most detection methods used in Nano biosensors. Based on the advantage of specific docking between two special 3D structures, they have become a potent candidate in bioanalysis and Nanodiagnostic tools. These tools lease users to do a simple, fast, cost-effective, sensitive, and specific detection of molecular biomarkers in real samples. Recent advantages of using protein-protein ligand Nano-biosensors application is remarkable due to its special docking that refers to each protein unique 3D conformation. However, it challenges different problems such as low rate of docking and hard process for fixation on the basic layer. These challenges make developers to optimize the structure and functions of proteins. The process has different Nano scale calculation that could be done with algorithms and solutions are available as bioinformatics tools. This article aimed to have a short overview of the abilities of bioinformatics tools for modeling and optimization of physiochemical features of proteins in Nano scale. Nano biosensors use different strategies which based on docking between two molecules to detect and identify different proteins. Molecular docking between transducer in Nano biosensors and proteins rely on physicochemical features of transducer, protein and docking strategy. Nano bioinformatics use bioinformatics tools and algorithms as a collective solution for developing functional structure in Nano scale. Nano bioinformatics use different bioinformatics tools to optimize physicochemical features of proteins as a new approach in Nano biosensors and drug discovery.
Collapse
Affiliation(s)
- Behzad Foroutan
- Tropical and Communicable Diseases Research Center, Iranshahr University of Medical Sciences, Iranshahr, Iran
- Department of Pharmacology, School of Medicine, Iranshahr University of Medical Sciences, Iranshahr, Iran
- Corresponding author. Tropical and Communicable Diseases Research Center, Iranshahr University of Medical Sciences, Iranshahr, Iran.
| | | |
Collapse
|
12
|
Reza MS, Zhang H, Hossain MT, Jin L, Feng S, Wei Y. COMTOP: Protein Residue-Residue Contact Prediction through Mixed Integer Linear Optimization. MEMBRANES 2021; 11:membranes11070503. [PMID: 34209399 PMCID: PMC8305966 DOI: 10.3390/membranes11070503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
Protein contact prediction helps reconstruct the tertiary structure that greatly determines a protein’s function; therefore, contact prediction from the sequence is an important problem. Recently there has been exciting progress on this problem, but many of the existing methods are still low quality of prediction accuracy. In this paper, we present a new mixed integer linear programming (MILP)-based consensus method: a Consensus scheme based On a Mixed integer linear opTimization method for prOtein contact Prediction (COMTOP). The MILP-based consensus method combines the strengths of seven selected protein contact prediction methods, including CCMpred, EVfold, DeepCov, NNcon, PconsC4, plmDCA, and PSICOV, by optimizing the number of correctly predicted contacts and achieving a better prediction accuracy. The proposed hybrid protein residue–residue contact prediction scheme was tested in four independent test sets. For 239 highly non-redundant proteins, the method showed a prediction accuracy of 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively. When tested on the CASP13 and CASP14 test sets, the proposed method obtained accuracies of 75.91% and 77.49% for top-L/5 predictions, respectively. COMTOP was further tested on 57 non-redundant α-helical transmembrane proteins and achieved prediction accuracies of 64.34% and 73.91% for top-L/2 and top-L/5 predictions, respectively. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For example, COMTOP performed much better for large number of contact predictions (such as top-5L and top-3L) than for small number of contact predictions such as top-L/2 and top-L/5. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.
Collapse
Affiliation(s)
- Md. Selim Reza
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Huiling Zhang
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Md. Tofazzal Hossain
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Langxi Jin
- Department of Computer Science and Technology, School of Computer Science and Technology, Harbin University of Science and Technology, 52 Xuefu Road, Nangang District, Harbin 150080, China;
| | - Shengzhong Feng
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Yanjie Wei
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
- Correspondence:
| |
Collapse
|
13
|
Zhang H, Bei Z, Xi W, Hao M, Ju Z, Saravanan KM, Zhang H, Guo N, Wei Y. Evaluation of residue-residue contact prediction methods: From retrospective to prospective. PLoS Comput Biol 2021; 17:e1009027. [PMID: 34029314 PMCID: PMC8177648 DOI: 10.1371/journal.pcbi.1009027] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 06/04/2021] [Accepted: 04/28/2021] [Indexed: 12/31/2022] Open
Abstract
Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized. The amino acid sequence of a protein ultimately determines its tertiary structure, and the tertiary structure determines its function(s) and plays a key role in understanding biological processes and disease pathogenesis. Protein tertiary structure can be determined using experimental techniques such as cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, which are very expensive and time-consuming. As an alternative, researchers are trying to use in silico methods to predict the 3D structures. Residue contact-assisted protein folding paves an avenue for sequence-based protein structure prediction and therefore has become one of the most challenging and promising problems in structural bioinformatics. Over the past years, contact prediction has undergone continuous evolution in techniques. Through a retrospective analysis of traditional machine learning /evolutionary coupling analysis methods/ consensus machine learning methods and a multi-perspective study on recently developed deep learning methods, we explore the most advanced contact predictors, pursue application scenarios for different methods, and seek prospective directions for further improvement. We anticipate that our study will serve as a practical and useful guide for the development of future approaches to contact prediction.
Collapse
Affiliation(s)
- Huiling Zhang
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zhendong Bei
- Cloud Computing Department, Alibaba Group, Hangzhou, China
| | - Wenhui Xi
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Zhen Ju
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Konda Mani Saravanan
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Haiping Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ning Guo
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- * E-mail:
| |
Collapse
|
14
|
Machine learning in protein structure prediction. Curr Opin Chem Biol 2021; 65:1-8. [PMID: 34015749 DOI: 10.1016/j.cbpa.2021.04.005] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 04/10/2021] [Indexed: 12/31/2022]
Abstract
Prediction of protein structure from sequence has been intensely studied for many decades, owing to the problem's importance and its uniquely well-defined physical and computational bases. While progress has historically ebbed and flowed, the past two years saw dramatic advances driven by the increasing "neuralization" of structure prediction pipelines, whereby computations previously based on energy models and sampling procedures are replaced by neural networks. The extraction of physical contacts from the evolutionary record; the distillation of sequence-structure patterns from known structures; the incorporation of templates from homologs in the Protein Databank; and the refinement of coarsely predicted structures into finely resolved ones have all been reformulated using neural networks. Cumulatively, this transformation has resulted in algorithms that can now predict single protein domains with a median accuracy of 2.1 Å, setting the stage for a foundational reconfiguration of the role of biomolecular modeling within the life sciences.
Collapse
|
15
|
Abstract
For two decades, Rosetta has consistently been at the forefront of protein structure
prediction. While it has become a very large package comprising programs, scripts, and tools, for
different types of macromolecular modelling such as ligand docking, protein-protein docking,
protein design, and loop modelling, it started as the implementation of an algorithm for ab initio
protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the
literature to describe that algorithm and its contribution to the third edition of the community wide
Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta
stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers
have been contributing to deciphering ’the second half of the genetic code’. Although the focus of
Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is
associated with its fragment-assembly protein structure prediction approach. Following a
presentation of the main concepts underpinning its foundation, especially sequence-structure
correlation and usage of fragments, we review the main stages of its developments and highlight
the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
Collapse
Affiliation(s)
- Jad Abbass
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, United Kingdom
| |
Collapse
|
16
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
17
|
Liu J, Zhou XG, Zhang Y, Zhang GJ. CGLFold: a contact-assisted de novo protein structure prediction using global exploration and loop perturbation sampling algorithm. Bioinformatics 2020; 36:2443-2450. [PMID: 31860059 DOI: 10.1093/bioinformatics/btz943] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/10/2019] [Accepted: 12/18/2019] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Regions that connect secondary structure elements in a protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of protein structure prediction can be improved using a loop-specific sampling strategy. RESULTS A novel de novo protein structure prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score ≥ 0.5 models on 95 standard test proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. AVAILABILITY AND IMPLEMENTATION The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
18
|
Hasan M, Azim KF, Imran MAS, Chowdhury IM, Urme SRA, Parvez MSA, Uddin MB, Ahmed SSU. Comprehensive genome based analysis of Vibrio parahaemolyticus for identifying novel drug and vaccine molecules: Subtractive proteomics and vaccinomics approach. PLoS One 2020; 15:e0237181. [PMID: 32813697 PMCID: PMC7444560 DOI: 10.1371/journal.pone.0237181] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 07/21/2020] [Indexed: 02/07/2023] Open
Abstract
Multidrug-resistant Vibrio parahaemolyticus has become a significant public health concern. The development of effective drugs and vaccines against Vibrio parahaemolyticus is the current research priority. Thus, we aimed to find out effective drug and vaccine targets using a comprehensive genome-based analysis. A total of 4822 proteins were screened from V. parahaemolyticus proteome. Among 16 novel cytoplasmic proteins, 'VIBPA Type II secretion system protein L' and 'VIBPA Putative fimbrial protein Z' were subjected to molecular docking with 350 human metabolites, which revealed that Eliglustat, Simvastatin and Hydroxocobalamin were the top drug molecules considering free binding energy. On the contrary, 'Sensor histidine protein kinase UhpB' and 'Flagellar hook-associated protein of 25 novel membrane proteins were subjected to T-cell and B-cell epitope prediction, antigenicity testing, transmembrane topology screening, allergenicity and toxicity assessment, population coverage analysis and molecular docking analysis to generate the most immunogenic epitopes. Three subunit vaccines were constructed by the combination of highly antigenic epitopes along with suitable adjuvant, PADRE sequence and linkers. The designed vaccine constructs (V1, V2, V3) were analyzed by their physiochemical properties and molecular docking with MHC molecules- results suggested that the V1 is superior. Besides, the binding affinity of human TLR-1/2 heterodimer and construct V1 could be biologically significant in the development of the vaccine repertoire. The vaccine-receptor complex exhibited deformability at a minimum level that also strengthened our prediction. The optimized codons of the designed construct was cloned into pET28a(+) vector of E. coli strain K12. However, the predicted drug molecules and vaccine constructs could be further studied using model animals to combat V. parahaemolyticus associated infections.
Collapse
Affiliation(s)
- Mahmudul Hasan
- Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Kazi Faizul Azim
- Department of Microbial Biotechnology, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Md. Abdus Shukur Imran
- Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Ishtiak Malique Chowdhury
- Department of Molecular Biology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh
| | | | - Md. Sorwer Alam Parvez
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | - Md. Bashir Uddin
- Department of Medicine, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Syed Sayeem Uddin Ahmed
- Department of Epidemiology and Public Health, Sylhet Agricultural University, Sylhet, Bangladesh
| |
Collapse
|
19
|
Zhang GJ, Ma LF, Wang XQ, Zhou XG. Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1068-1081. [PMID: 30295627 DOI: 10.1109/tcbb.2018.2873691] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Ab initio protein tertiary structure prediction is one of the long-standing problems in structural bioinformatics. With the help of residue-residue contact and secondary structure prediction information, the accuracy of ab initio structure prediction can be enhanced. In this study, an improved differential evolution with secondary structure and residue-residue contact information referred to as SCDE is proposed for protein structure prediction. In SCDE, two score models based on secondary structure and contact information are proposed, and two selection strategies, namely, secondary structure-based selection strategy and contact-based selection strategy, are designed to guide conformation space search. A probability distribution function is designed to balance these two selection strategies. Experimental results on a benchmark dataset with 28 proteins and four free model targets in CASP12 demonstrate that the proposed SCDE is effective and efficient.
Collapse
|
20
|
Abbass J, Nebel JC. Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure. BMC Bioinformatics 2020; 21:170. [PMID: 32357827 PMCID: PMC7195757 DOI: 10.1186/s12859-020-3491-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whenever suitable template structures are not available, usage of fragment-based protein structure prediction becomes the only practical alternative as pure ab initio techniques require massive computational resources even for very small proteins. However, inaccuracy of their energy functions and their stochastic nature imposes generation of a large number of decoys to explore adequately the solution space, limiting their usage to small proteins. Taking advantage of the uneven complexity of the sequence-structure relationship of short fragments, we adjusted the fragment insertion process by customising the number of available fragment templates according to the expected complexity of the predicted local secondary structure. Whereas the number of fragments is kept to its default value for coil regions, important and dramatic reductions are proposed for beta sheet and alpha helical regions, respectively. RESULTS The evaluation of our fragment selection approach was conducted using an enhanced version of the popular Rosetta fragment-based protein structure prediction tool. It was modified so that the number of fragment candidates used in Rosetta could be adjusted based on the local secondary structure. Compared to Rosetta's standard predictions, our strategy delivered improved first models, + 24% and + 6% in terms of GDT, when using 2000 and 20,000 decoys, respectively, while reducing significantly the number of fragment candidates. Furthermore, our enhanced version of Rosetta is able to deliver with 2000 decoys a performance equivalent to that produced by standard Rosetta while using 20,000 decoys. We hypothesise that, as the fragment insertion process focuses on the most challenging regions, such as coils, fewer decoys are needed to explore satisfactorily conformation spaces. CONCLUSIONS Taking advantage of the high accuracy of sequence-based secondary structure predictions, we showed the value of that information to customise the number of candidates used during the fragment insertion process of fragment-based protein structure prediction. Experimentations conducted using standard Rosetta showed that, when using the recommended number of decoys, i.e. 20,000, our strategy produces better results. Alternatively, similar results can be achieved using only 2000 decoys. Consequently, we recommend the adoption of this strategy to either improve significantly model quality or reduce processing times by a factor 10.
Collapse
Affiliation(s)
- Jad Abbass
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
| |
Collapse
|
21
|
Lesk AM. Not Enough Natural Data? Sequence and Ye Shall Find. Front Mol Biosci 2020; 7:65. [PMID: 32373628 PMCID: PMC7186298 DOI: 10.3389/fmolb.2020.00065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 03/25/2020] [Indexed: 11/28/2022] Open
Affiliation(s)
- Arthur M Lesk
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States
| |
Collapse
|
22
|
Lemke T, Berg A, Jain A, Peter C. EncoderMap(II): Visualizing Important Molecular Motions with Improved Generation of Protein Conformations. J Chem Inf Model 2019; 59:4550-4560. [PMID: 31647645 DOI: 10.1021/acs.jcim.9b00675] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Dimensionality reduction can be used to project high-dimensional molecular data into a simplified, low-dimensional map. One feature of our recently introduced dimensionality reduction technique EncoderMap, which relies on the combination of an autoencoder with multidimensional scaling, is its ability to do the reverse. It is able to generate conformations for any selected points in the low-dimensional map. This transfers the simplified, low-dimensional map back into the high-dimensional conformational space. Although the output is again high-dimensional, certain aspects of the simplification are preserved. The generated conformations only mirror the most dominant conformational differences that determine the positions of conformational states in the low-dimensional map. This allows depicting such differences and-in consequence-visualizing molecular motions and gives a unique perspective on high-dimensional conformational data. In our previous work, protein conformations described in backbone dihedral angle space were used as the input for EncoderMap, and conformations were also generated in this space. For large proteins, however, the generation of conformations is inaccurate with this approach due to the local character of backbone dihedral angles. Here, we present an improved variant of EncoderMap which is able to generate large protein conformations that are accurate in short-range and long-range orders. This is achieved by differentiable reconstruction of Cartesian coordinates from the generated dihedrals, which allows adding a contribution to the cost function that monitors the accuracy of all pairwise distances between the Cα-atoms of the generated conformations. The improved capabilities to generate conformations of large, even multidomain, proteins are demonstrated for two examples: diubiquitin and a part of the Ssa1 Hsp70 yeast chaperone. We show that the improved variant of EncoderMap can nicely visualize motions of protein domains relative to each other but is also able to highlight important conformational changes within the individual domains.
Collapse
Affiliation(s)
- Tobias Lemke
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| | - Andrej Berg
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| | - Alok Jain
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany.,Department of Biotechnology , National Institute of Pharmaceutical Education and Research Ahmedabad , Gandhinagar , Gujarat 382355 , India
| | - Christine Peter
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| |
Collapse
|
23
|
Wozniak PP, Pelc J, Skrzypecki M, Vriend G, Kotulska M. Bio-knowledge-based filters improve residue-residue contact prediction accuracy. Bioinformatics 2019; 34:3675-3683. [PMID: 29850768 DOI: 10.1093/bioinformatics/bty416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 05/19/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation Residue-residue contact prediction through direct coupling analysis has reached impressive accuracy, but yet higher accuracy will be needed to allow for routine modelling of protein structures. One way to improve the prediction accuracy is to filter predicted contacts using knowledge about the particular protein of interest or knowledge about protein structures in general. Results We focus on the latter and discuss a set of filters that can be used to remove false positive contact predictions. Each filter depends on one or a few cut-off parameters for which the filter performance was investigated. Combining all filters while using default parameters resulted for a test set of 851 protein domains in the removal of 29% of the predictions of which 92% were indeed false positives. Availability and implementation All data and scripts are available at http://comprec-lin.iiar.pwr.edu.pl/FPfilter/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P P Wozniak
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - J Pelc
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - M Skrzypecki
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - M Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| |
Collapse
|
24
|
Kandathil SM, Greener JG, Jones DT. Recent developments in deep learning applied to protein structure prediction. Proteins 2019; 87:1179-1189. [PMID: 31589782 PMCID: PMC6899861 DOI: 10.1002/prot.25824] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/29/2022]
Abstract
Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Joe G Greener
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
25
|
Greener JG, Kandathil SM, Jones DT. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat Commun 2019; 10:3977. [PMID: 31484923 PMCID: PMC6726615 DOI: 10.1038/s41467-019-11994-0] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 08/14/2019] [Indexed: 01/30/2023] Open
Abstract
The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
| |
Collapse
|
26
|
Hasan M, Islam S, Chakraborty S, Mustafa AH, Azim KF, Joy ZF, Hossain MN, Foysal SH, Hasan MN. Contriving a chimeric polyvalent vaccine to prevent infections caused by herpes simplex virus (type-1 and type-2): an exploratory immunoinformatic approach. J Biomol Struct Dyn 2019; 38:2898-2915. [PMID: 31328668 DOI: 10.1080/07391102.2019.1647286] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Herpes simplex virus type 1 (HSV-1) and 2 (HSV-2) cause a variety of infections including oral-facial infections, genital herpes, herpes keratitis, cutaneous infection and so on. To date, FDA-approved licensed HSV vaccine is not available yet. Hence, the study was conducted to identify and characterize an effective epitope based polyvalent vaccine against both types of Herpes Simplex Virus. The selected proteins were retrieved from ViralZone and assessed to design highly antigenic epitopes by binding analyses of the peptides with MHC class-I and class-II molecules, antigenicity screening, transmembrane topology screening, allergenicity and toxicity assessment, population coverage analysis and molecular docking approach. The final vaccine was constructed by the combination of top CTL, HTL and BCL epitopes from each protein along with suitable adjuvant and linkers. Physicochemical and secondary structure analysis, disulfide engineering, molecular dynamic simulation and codon adaptation were further employed to develop a unique multi-epitope peptide vaccine. Docking analysis of the refined vaccine structure with different MHC molecules and human immune TLR-2 receptor demonstrated higher interaction. Complexed structure of the modeled vaccine and TLR-2 showed minimal deformability at molecular level. Moreover, translational potency and microbial expression of the modeled vaccine was analyzed with pET28a(+) vector for E. coli strain K12 and the vaccine constructs had no similarity with entire human proteome. The study enabled design of a novel chimeric polyvalent vaccine to confer broad range immunity against both HSV serotypes. However, further wet lab based research using model animals are highly recommended to experimentally validate our findings.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Mahmudul Hasan
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh.,Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Shiful Islam
- Department of Biochemistry and Molecular Biology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | - Sourav Chakraborty
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | - Abu Hasnat Mustafa
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | - Kazi Faizul Azim
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh.,Department of Microbial Biotechnology, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Ziaul Faruque Joy
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, Bangladesh.,Department of Biomedical Science, University of Sheffield, Sheffield, UK
| | - Md Nazmul Hossain
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, Bangladesh.,Department of Microbial Biotechnology, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Shakhawat Hossain Foysal
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | - Md Nazmul Hasan
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| |
Collapse
|
27
|
Marks C, Deane CM. Increasing the accuracy of protein loop structure prediction with evolutionary constraints. Bioinformatics 2019; 35:2585-2592. [PMID: 30535347 DOI: 10.1093/bioinformatics/bty996] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/28/2018] [Accepted: 12/07/2018] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Accurate prediction of loop structures remains challenging. This is especially true for long loops where the large conformational space and limited coverage of experimentally determined structures often leads to low accuracy. Co-evolutionary contact predictors, which provide information about the proximity of pairs of residues, have been used to improve whole-protein models generated through de novo techniques. Here we investigate whether these evolutionary constraints can enhance the prediction of long loop structures. RESULTS As a first stage, we assess the accuracy of predicted contacts that involve loop regions. We find that these are less accurate than contacts in general. We also observe that some incorrectly predicted contacts can be identified as they are never satisfied in any of our generated loop conformations. We examined two different strategies for incorporating contacts, and on a test set of long loops (10 residues or more), both approaches improve the accuracy of prediction. For a set of 135 loops, contacts were predicted and hence our methods were applicable in 97 cases. Both strategies result in an increase in the proportion of near-native decoys in the ensemble, leading to more accurate predictions and in some cases improving the root-mean-square deviation of the final model by more than 3 Å. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claire Marks
- Department of Statistics, University of Oxford, Oxford, UK
| | | |
Collapse
|
28
|
Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA, Gauthier NP, Sander C, Marks DS. Inferring protein 3D structure from deep mutation scans. Nat Genet 2019; 51:1170-1176. [PMID: 31209393 PMCID: PMC7295002 DOI: 10.1038/s41588-019-0432-9] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 04/29/2019] [Indexed: 11/09/2022]
Abstract
We describe an experimental method of three-dimensional (3D) structure determination that exploits the increasing ease of high-throughput mutational scans. Inspired by the success of using natural, evolutionary sequence covariation to compute protein and RNA folds, we explored whether 'laboratory', synthetic sequence variation might also yield 3D structures. We analyzed five large-scale mutational scans and discovered that the pairs of residues with the largest positive epistasis in the experiments are sufficient to determine the 3D fold. We show that the strongest epistatic pairings from genetic screens of three proteins, a ribozyme and a protein interaction reveal 3D contacts within and between macromolecules. Using these experimental epistatic pairs, we compute ab initio folds for a GB1 domain (within 1.8 Å of the crystal structure) and a WW domain (2.1 Å). We propose strategies that reduce the number of mutants needed for contact prediction, suggesting that genomics-based techniques can efficiently predict 3D structure.
Collapse
Affiliation(s)
- Nathan J Rollins
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Kelly P Brock
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Frank J Poelwijk
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael A Stiffler
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nicholas P Gauthier
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
29
|
Azim KF, Hasan M, Hossain MN, Somana SR, Hoque SF, Bappy MNI, Chowdhury AT, Lasker T. Immunoinformatics approaches for designing a novel multi epitope peptide vaccine against human norovirus (Norwalk virus). INFECTION GENETICS AND EVOLUTION 2019; 74:103936. [PMID: 31233780 DOI: 10.1016/j.meegid.2019.103936] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Revised: 06/18/2019] [Accepted: 06/20/2019] [Indexed: 12/19/2022]
Abstract
Norovirus is known as a major cause of several acute gastroenteritis (AGE) outbreaks each year. A study was conducted to develop a unique multi epitope subunit vaccine against human norovirus by adopting reverse vaccinology approach. The entire viral proteome of Norwalk virus was retrieved and allowed for further in silico study to predict highly antigenic epitopes through antigenicity, transmembrane topology screening, allergenicity assessment, toxicity analysis, population coverage analysis and molecular docking approach. Capsid protein VP1 and protein VP2 were identified as most antigenic viral proteins which generated a plethora of antigenic epitopes. Physicochemical properties and secondary structure of the designed vaccine were assessed to ensure its thermostability, hydrophilicity, theoretical PI and structural behavior. Molecular docking analysis of the refined vaccine with different MHCs and human immune TLR8 receptor demonstrated higher binding interaction as well. Complexed structure of the modeled vaccine and TLR8 showed minimal deformability at molecular level. The designed construct was reverse transcribed and adapted for E. coli strain K12 prior to insertion within pET28a(+) vector for its heterologous cloning and expression, and sequence of vaccine constructs showed no similarity with human proteins. However, the study could initiate in vitro and in vivo studies regarding effective vaccine development against human norovirus.
Collapse
Affiliation(s)
- Kazi Faizul Azim
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh; Department of Microbial Biotechnology, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Mahmudul Hasan
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh; Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Md Nazmul Hossain
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh; Department of Microbial Biotechnology, Sylhet Agricultural University, Sylhet 3100, Bangladesh.
| | - Saneya Risa Somana
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Syeda Farjana Hoque
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Md Nazmul Islam Bappy
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Anjum Taiebah Chowdhury
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Tahera Lasker
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| |
Collapse
|
30
|
Wu Q, Peng Z, Anishchenko I, Cong Q, Baker D, Yang J. Protein contact prediction using metagenome sequence data and residual neural networks. Bioinformatics 2019; 36:41-48. [PMID: 31173061 PMCID: PMC8792440 DOI: 10.1093/bioinformatics/btz477] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 05/30/2019] [Accepted: 06/04/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Almost all protein residue contact prediction methods rely on the availability of deep multiple sequence alignments (MSAs). However, many proteins from the poorly populated families do not have sufficient number of homologs in the conventional UniProt database. Here we aim to solve this issue by exploring the rich sequence data from the metagenome sequencing projects. RESULTS Based on the improved MSA constructed from the metagenome sequence data, we developed MapPred, a new deep learning-based contact prediction method. MapPred consists of two component methods, DeepMSA and DeepMeta, both trained with the residual neural networks. DeepMSA was inspired by the recent method DeepCov, which was trained on 441 matrices of covariance features. By considering the symmetry of contact map, we reduced the number of matrices to 231, which makes the training more efficient in DeepMSA. Experiments show that DeepMSA outperforms DeepCov by 10-13% in precision. DeepMeta works by combining predicted contacts and other sequence profile features. Experiments on three benchmark datasets suggest that the contribution from the metagenome sequence data is significant with P-values less than 4.04E-17. MapPred is shown to be complementary and comparable the state-of-the-art methods. The success of MapPred is attributed to three factors: the deeper MSA from the metagenome sequence data, improved feature design in DeepMSA and optimized training by the residual neural networks. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/mappred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qi Wu
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- To whom correspondence should be addressed. E-mail: or
| | - Ivan Anishchenko
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Qian Cong
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - David Baker
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Jianyi Yang
- To whom correspondence should be addressed. E-mail: or
| |
Collapse
|
31
|
Hasan M, Azim KF, Begum A, Khan NA, Shammi TS, Imran AS, Chowdhury IM, Urme SRA. Vaccinomics strategy for developing a unique multi-epitope monovalent vaccine against Marburg marburgvirus. INFECTION GENETICS AND EVOLUTION 2019; 70:140-157. [DOI: 10.1016/j.meegid.2019.03.003] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 02/09/2019] [Accepted: 03/04/2019] [Indexed: 12/23/2022]
|
32
|
Adhikari B, Hou J, Cheng J. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 2019; 34:1466-1472. [PMID: 29228185 PMCID: PMC5925776 DOI: 10.1093/bioinformatics/btx781] [Citation(s) in RCA: 105] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 12/07/2017] [Indexed: 12/14/2022] Open
Abstract
Motivation Significant improvements in the prediction of protein residue–residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction. Results In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks—the first five predict contacts at 6, 7.5, 8, 8.5 and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11 and 12 experiments, DNCON2 achieves mean precisions of 35, 50 and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length. Availability and implementation The web server of DNCON2 is at http://sysbio.rnet.missouri.edu/dncon2/ where training and testing datasets as well as the predictions for CASP10, 11 and 12 free-modeling datasets can also be downloaded. Its source code is available at https://github.com/multicom-toolbox/DNCON2/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| | - Jie Hou
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| | - Jianlin Cheng
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA.,Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
33
|
Hasan M, Ghosh PP, Azim KF, Mukta S, Abir RA, Nahar J, Hasan Khan MM. Reverse vaccinology approach to design a novel multi-epitope subunit vaccine against avian influenza A (H7N9) virus. Microb Pathog 2019; 130:19-37. [PMID: 30822457 DOI: 10.1016/j.micpath.2019.02.023] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Revised: 02/20/2019] [Accepted: 02/21/2019] [Indexed: 12/18/2022]
Abstract
H7N9, a novel strain of avian origin influenza was the first recorded incidence where a human was transited by a N9 type influenza virus. Effective vaccination against influenza A (H7N9) is a major concern, since it has emerged as a life threatening viral pathogen. Here, an in silico reverse vaccinology strategy was adopted to design a unique chimeric subunit vaccine against avian influenza A (H7N9). Induction of humoral and cell-mediated immunity is the prime concerned characteristics for a peptide vaccine candidate, hence both T cell and B cell immunity of viral proteins were screened. Antigenicity testing, transmembrane topology screening, allergenicity and toxicity assessment, population coverage analysis and molecular docking approach were adopted to generate the most antigenic epitopes of avian influenza A (H7N9) proteome. Further, a novel subunit vaccine was designed by the combination of highly immunogenic epitopes along with suitable adjuvant and linkers. Physicochemical properties and secondary structure of the designed vaccine were assessed to ensure its thermostability, h ydrophilicity, theoretical PI and structural behavior. Homology modeling, refinement and validation of the designed vaccine allowed to construct a three dimensional structure of the predicted vaccine, further employed to molecular docking analysis with different MHC molecules and human immune TLR8 receptor present on lymphocyte cells. Moreover, disulfide engineering was employed to lessen the high mobility region of the designed vaccine in order to extend its stability. Furthermore, we investigated the molecular dynamic simulation of the modeled subunit vaccine and TLR8 complexed molecule to strengthen our prediction. Finally, the suggested vaccine was reverse transcribed and adapted for E. coli strain K12 prior to insertion within pET28a(+) vector for checking translational potency and microbial expression.
Collapse
Affiliation(s)
- Mahmudul Hasan
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh; Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh.
| | - Progga Paromita Ghosh
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Kazi Faizul Azim
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Shamsunnahar Mukta
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh; Department of Plant and Environmental Biotechnology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Ruhshan Ahmed Abir
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
| | - Jannatun Nahar
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
| | - Mohammad Mehedi Hasan Khan
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh; Department of Biochemistry and Chemistry, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| |
Collapse
|
34
|
MacCarthy E, Perry D, Kc DB. Advances in Protein Super-Secondary Structure Prediction and Application to Protein Structure Prediction. Methods Mol Biol 2019; 1958:15-45. [PMID: 30945212 DOI: 10.1007/978-1-4939-9161-7_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Due to the advancement in various sequencing technologies, the gap between the number of protein sequences and the number of experimental protein structures is ever increasing. Community-wide initiatives like CASP have resulted in considerable efforts in the development of computational methods to accurately model protein structures from sequences. Sequence-based prediction of super-secondary structure has direct application in protein structure prediction, and there have been significant efforts in the prediction of super-secondary structure in the last decade. In this chapter, we first introduce the protein structure prediction problem and highlight some of the important progress in the field of protein structure prediction. Next, we discuss recent methods for the prediction of super-secondary structures. Finally, we discuss applications of super-secondary structure prediction in structure prediction/analysis of proteins. We also discuss prediction of protein structures that are composed of simple super-secondary structure repeats and protein structures that are composed of complex super-secondary structure repeats. Finally, we also discuss the recent trends in the field.
Collapse
Affiliation(s)
- Elijah MacCarthy
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Derrick Perry
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Dukka B Kc
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
| |
Collapse
|
35
|
Wan C. Background on Biology of Ageing and Bioinformatics. ADVANCED INFORMATION AND KNOWLEDGE PROCESSING 2019:25-43. [DOI: 10.1007/978-3-319-97919-9_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
36
|
Vorberg S, Seemayer S, Söding J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction. PLoS Comput Biol 2018; 14:e1006526. [PMID: 30395601 PMCID: PMC6237422 DOI: 10.1371/journal.pcbi.1006526] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 11/15/2018] [Accepted: 09/24/2018] [Indexed: 12/01/2022] Open
Abstract
Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny. Knowledge about the three-dimensional structure of proteins is key to understanding their function and role in biological processes and diseases. The experimental structure determination techniques, such as X-ray crystallography or electron cryo-microscopy, are labour intensive, time-consuming and expensive. Therefore, complementary computational methods to predict a protein’s structure have become indispensable. Over the last years, immense progress has been made in predicting protein structures from their amino acid sequence by utilizing highly accurate predictions of spatial contacts between amino acid residues as constraints in folding simulations. However, contact prediction methods require large numbers of homologous protein sequences in order to discriminate between signal and noise. A major obstacle preventing progress on the statistical methodology is our limited understanding of the different components of noise that are known to affect the predictions. We provide two tools, CCMpredPy and CCMgen, that can be used to learn highly accurate statistical models for contact prediction and to simulate protein evolution according to the statistical constraints between positions of residues as specified by these models, respectively. We showcase their usefulness by quantifying the relative contribution of noise arising from entropy and phylogeny on the predicted contacts, which will facilitate the improvement of the statistical methodology.
Collapse
Affiliation(s)
- Susann Vorberg
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Stefan Seemayer
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Johannes Söding
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| |
Collapse
|
37
|
Co-Evolution of Intrinsically Disordered Proteins with Folded Partners Witnessed by Evolutionary Couplings. Int J Mol Sci 2018; 19:ijms19113315. [PMID: 30366362 PMCID: PMC6274761 DOI: 10.3390/ijms19113315] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 10/19/2018] [Accepted: 10/22/2018] [Indexed: 12/22/2022] Open
Abstract
Although improved strategies for the detection and analysis of evolutionary couplings (ECs) between protein residues already enable the prediction of protein structures and interactions, they are mostly restricted to conserved and well-folded proteins. Whereas intrinsically disordered proteins (IDPs) are central to cellular interaction networks, due to the lack of strict structural constraints, they undergo faster evolutionary changes than folded domains. This makes the reliable identification and alignment of IDP homologs difficult, which led to IDPs being omitted in most large-scale residue co-variation analyses. By preforming a dedicated analysis of phylogenetically widespread bacterial IDP–partner interactions, here we demonstrate that partner binding imposes constraints on IDP sequences that manifest in detectable interprotein ECs. These ECs were not detected for interactions mediated by short motifs, rather for those with larger IDP–partner interfaces. Most identified coupled residue pairs reside close (<10 Å) to each other on the interface, with a third of them forming multiple direct atomic contacts. EC-carrying interfaces of IDPs are enriched in negatively charged residues, and the EC residues of both IDPs and partners preferentially reside in helices. Our analysis brings hope that IDP–partner interactions difficult to study could soon be successfully dissected through residue co-variation analysis.
Collapse
|
38
|
Jones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 2018; 34:3308-3315. [PMID: 29718112 PMCID: PMC6157083 DOI: 10.1093/bioinformatics/bty341] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 03/06/2018] [Accepted: 04/25/2018] [Indexed: 12/22/2022] Open
Abstract
Motivation In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Results Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. Availability and implementation DeepCov is freely available at https://github.com/psipred/DeepCov. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David T Jones
- Department of Computer Science, University College London, London, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
39
|
Kandathil SM, Garza-Fabre M, Handl J, Lovell SC. Improved fragment-based protein structure prediction by redesign of search heuristics. Sci Rep 2018; 8:13694. [PMID: 30209258 PMCID: PMC6135816 DOI: 10.1038/s41598-018-31891-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 08/22/2018] [Indexed: 11/09/2022] Open
Abstract
Difficulty in sampling large and complex conformational spaces remains a key limitation in fragment-based de novo prediction of protein structure. Our previous work has shown that even for small-to-medium-sized proteins, some current methods inadequately sample alternative structures. We have developed two new conformational sampling techniques, one employing a bilevel optimisation framework and the other employing iterated local search. We combine strategies of forced structural perturbation (where some fragment insertions are accepted regardless of their impact on scores) and greedy local optimisation, allowing greater exploration of the available conformational space. Comparisons against the Rosetta Abinitio method indicate that our protocols more frequently generate native-like predictions for many targets, even following the low-resolution phase, using a given set of fragment libraries. By contrasting results across two different fragment sets, we show that our methods are able to better take advantage of high-quality fragments. These improvements can also translate into more reliable identification of near-native structures in a simple clustering-based model selection procedure. We show that when fragment libraries are sufficiently well-constructed, improved breadth of exploration within runs improves prediction accuracy. Our results also suggest that in benchmarking scenarios, a total exclusion of fragments drawn from homologous templates can make performance differences between methods appear less pronounced.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PL, United Kingdom. .,Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| | - Mario Garza-Fabre
- Decision and Cognitive Sciences Research Centre, University of Manchester, Manchester, M13 9PL, United Kingdom.,Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN), Km. 5.5 Carretera Cd. Victoria-Soto La Marina, Cd. Victoria, Tamaulipas, 87130, Mexico
| | - Julia Handl
- Decision and Cognitive Sciences Research Centre, University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Simon C Lovell
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|
40
|
de Oliveira SHP, Shi J, Deane CM. Comparing co-evolution methods and their application to template-free protein structure prediction. Bioinformatics 2018; 33:373-381. [PMID: 28171606 PMCID: PMC5860252 DOI: 10.1093/bioinformatics/btw618] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Revised: 09/19/2016] [Accepted: 09/22/2016] [Indexed: 02/01/2023] Open
Abstract
Motivation Co-evolution methods have been used as contact predictors to identify pairs of residues that share spatial proximity. Such contact predictors have been compared in terms of the precision of their predictions, but there is no study that compares their usefulness to model generation. Results We compared eight different co-evolution methods for a set of ∼3500 proteins and found that metaPSICOV stage 2 produces, on average, the most precise predictions. Precision of all the methods is dependent on SCOP class, with most methods predicting contacts in all α and membrane proteins poorly. The contact predictions were then used to assist in de novo model generation. We found that it was not the method with the highest average precision, but rather metaPSICOV stage 1 predictions that consistently led to the best models being produced. Our modelling results show a correlation between the proportion of predicted long range contacts that are satisfied on a model and its quality. We used this proportion to effectively classify models as correct/incorrect; discarding decoys classified as incorrect led to an enrichment in the proportion of good decoys in our final ensemble by a factor of seven. For 17 out of the 18 cases where correct answers were generated, the best models were not discarded by this approach. We were also able to identify eight cases where no correct decoy had been generated. Availability and Implementation Data is available for download from: http://opig.stats.ox.ac.uk/resources. Contact saulo.deoliveira@dtc.ox.ac.uk Supplimentary Information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Jiye Shi
- Department of Informatics, UCB Pharma, Slough SL1 3WE, UK,Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China
| | | |
Collapse
|
41
|
Solanki V, Tiwari V. Subtractive proteomics to identify novel drug targets and reverse vaccinology for the development of chimeric vaccine against Acinetobacter baumannii. Sci Rep 2018; 8:9044. [PMID: 29899345 PMCID: PMC5997985 DOI: 10.1038/s41598-018-26689-7] [Citation(s) in RCA: 178] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 05/17/2018] [Indexed: 11/24/2022] Open
Abstract
The emergence of drug-resistant Acinetobacter baumannii is the global health problem associated with high mortality and morbidity. Therefore it is high time to find a suitable therapeutics for this pathogen. In the present study, subtractive proteomics along with reverse vaccinology approaches were used to predict suitable therapeutics against A. baumannii. Using subtractive proteomics, we have identified promiscuous antigenic membrane proteins that contain the virulence factors, resistance factors and essentiality factor for this pathogenic bacteria. Selected promiscuous targeted membrane proteins were used for the design of chimeric-subunit vaccine with the help of reverse vaccinology. Available best tools and servers were used for the identification of MHC class I, II and B cell epitopes. All selected epitopes were further shortlisted computationally to know their immunogenicity, antigenicity, allergenicity, conservancy and toxicity potentials. Immunogenic predicted promiscuous peptides used for the development of chimeric subunit vaccine with immune-modulating adjuvants, linkers, and PADRE (Pan HLA-DR epitopes) amino acid sequence. Designed vaccine construct V4 also interact with the MHC, and TLR4/MD2 complex as confirm by docking and molecular dynamics simulation studies. Therefore designed vaccine construct V4 can be developed to control the host-pathogen interaction or infection caused by A. baumannii.
Collapse
Affiliation(s)
- Vandana Solanki
- Department of Biochemistry, Central University of Rajasthan, Bandarsindri, Ajmer, 305817, India
| | - Vishvanath Tiwari
- Department of Biochemistry, Central University of Rajasthan, Bandarsindri, Ajmer, 305817, India.
| |
Collapse
|
42
|
de Oliveira SHP, Law EC, Shi J, Deane CM. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction. Bioinformatics 2018; 34:1132-1140. [PMID: 29136098 PMCID: PMC6030820 DOI: 10.1093/bioinformatics/btx722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 09/22/2017] [Accepted: 11/04/2017] [Indexed: 01/12/2023] Open
Abstract
Motivation Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. Results We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Availability and implementation Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. Contact saulo.deoliveira@dtc.ox.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Eleanor C Law
- Department of Statistics, University of Oxford, Oxford, UK
| | - Jiye Shi
- Department of Informatics, UCB Pharma, Slough, UK
- Division of Physical Biology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
43
|
Xu C, Bouvier G, Bardiaux B, Nilges M, Malliavin T, Lisser A. Ordering Protein Contact Matrices. Comput Struct Biotechnol J 2018; 16:140-156. [PMID: 29632657 PMCID: PMC5889711 DOI: 10.1016/j.csbj.2018.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 02/28/2018] [Accepted: 03/01/2018] [Indexed: 11/29/2022] Open
Abstract
Numerous biophysical approaches provide information about residues spatial proximity in proteins. However, correct assignment of the protein fold from this proximity information is not straightforward if the spatially close protein residues are not assigned to residues in the primary sequence. Here, we propose an algorithm to assign such residue numbers by ordering the columns and lines of the raw protein contact matrix directly obtained from proximity information between unassigned amino acids. The ordering problem is formatted as the search of a trail within a graph connecting protein residues through the nonzero contact values. The algorithm performs in two steps: (i) finding the longest trail of the graph using an original dynamic programming algorithm, (ii) clustering the individual ordered matrices using a self-organizing map (SOM) approach. The combination of the dynamic programming and self-organizing map approaches constitutes a quite innovative point of the present work. The algorithm was validated on a set of about 900 proteins, representative of the sizes and proportions of secondary structures observed in the Protein Data Bank. The algorithm was revealed to be efficient for noise levels up to 40%, obtaining average gaps of about 20% at maximum between ordered and initial matrices. The proposed approach paves the ways toward a method of fold prediction from noisy proximity information, as TM scores larger than 0.5 have been obtained for ten randomly chosen proteins, in the case of a noise level of 10%. The methods has been also validated on two experimental cases, on which it performed satisfactorily.
Collapse
Affiliation(s)
- Chuan Xu
- Laboratoire de Recherche en Informatique, Université Paris-Sud and CNRS UMR8623, France
| | - Guillaume Bouvier
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, France
| | - Benjamin Bardiaux
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, France
| | - Michael Nilges
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, France
| | - Thérèse Malliavin
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, France
| | - Abdel Lisser
- Laboratoire de Recherche en Informatique, Université Paris-Sud and CNRS UMR8623, France
| |
Collapse
|
44
|
dos Santos RN, Ferrari AJR, de Jesus HCR, Gozzo FC, Morcos F, Martínez L. Enhancing protein fold determination by exploring the complementary information of chemical cross-linking and coevolutionary signals. Bioinformatics 2018; 34:2201-2208. [DOI: 10.1093/bioinformatics/bty074] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 02/10/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ricardo N dos Santos
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| | | | | | - Fábio C Gozzo
- Institute of Chemistry, University of Campinas, Campinas, Brazil
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, USA
| | - Leandro Martínez
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| |
Collapse
|
45
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
46
|
Adhikari B, Cheng J. CONFOLD2: improved contact-driven ab initio protein structure modeling. BMC Bioinformatics 2018; 19:22. [PMID: 29370750 PMCID: PMC5784681 DOI: 10.1186/s12859-018-2032-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2017] [Accepted: 01/17/2018] [Indexed: 12/31/2022] Open
Abstract
Background Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. Results We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. Conclusion CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/. Electronic supplementary material The online version of this article (10.1186/s12859-018-2032-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, 63121, MO, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, 65211, MO, USA.
| |
Collapse
|
47
|
Prediction of Structures and Interactions from Genome Information. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1105:123-152. [DOI: 10.1007/978-981-13-2200-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
48
|
Zhang C, Mortuza SM, He B, Wang Y, Zhang Y. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 2017; 86 Suppl 1:136-151. [PMID: 29082551 DOI: 10.1002/prot.25414] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/09/2017] [Accepted: 10/27/2017] [Indexed: 12/26/2022]
Abstract
We develop two complementary pipelines, "Zhang-Server" and "QUARK", based on I-TASSER and QUARK pipelines for template-based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I-TASSER and QUARK successfully folds three medium-size FM targets that have more than 150 residues, even though the interplay between the two pipelines still awaits further optimization. Newly developed sequence-based contact prediction by NeBcon plays a critical role to enhance the quality of models, particularly for FM targets, by the new pipelines. The inclusion of NeBcon predicted contacts as restraints in the QUARK simulations results in an average TM-score of 0.41 for the best in top five predicted models, which is 37% higher than that by the QUARK simulations without contacts. In particular, there are seven targets that are converted from non-foldable to foldable (TM-score >0.5) due to the use of contact restraints in the simulations. Another additional feature in the current pipelines is the local structure quality prediction by ResQ, which provides a robust residue-level modeling error estimation. Despite the success, significant challenges still remain in ab initio modeling of multi-domain proteins and folding of β-proteins with complicated topologies bound by long-range strand-strand interactions. Improvements on domain boundary and long-range contact prediction, as well as optimal use of the predicted contacts and multiple threading alignments, are critical to address these issues seen in the CASP12 experiment.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
49
|
Systematic Identification of Machine-Learning Models Aimed to Classify Critical Residues for Protein Function from Protein Structure. Molecules 2017; 22:molecules22101673. [PMID: 28991206 PMCID: PMC6151554 DOI: 10.3390/molecules22101673] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 09/24/2017] [Accepted: 09/24/2017] [Indexed: 12/14/2022] Open
Abstract
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis of the structure-function relationship of proteins: (i) the lack of a formal definition of what critical residues are and (ii) the lack of a systematic evaluation of methods and protein structure features. To address this problem, here we introduce an index to quantify the protein-function criticality of a residue based on experimental data and a strategy aimed to optimize both, descriptors of protein structure (physicochemical and centrality descriptors) and machine learning algorithms, to minimize the error in the classification of critical residues. We observed that both physicochemical and centrality descriptors of residues effectively relate protein structure and protein function, and that physicochemical descriptors better describe critical residues. We also show that critical residues are better classified when residue criticality is considered as a binary attribute (i.e., residues are considered critical or not critical). Using this binary annotation for critical residues 8 models rendered accurate and non-overlapping classification of critical residues, confirming the multi-factorial character of the structure-function relationship of proteins.
Collapse
|
50
|
Buchan DWA, Jones DT. EigenTHREADER: analogous protein fold recognition by efficient contact map threading. Bioinformatics 2017; 33:2684-2690. [PMID: 28419258 PMCID: PMC5860056 DOI: 10.1093/bioinformatics/btx217] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 01/18/2017] [Accepted: 04/12/2017] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem ( Moult et al., 2014 ). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010) , but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is. RESULTS EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods. AVAILABILITY AND IMPLEMENTATION All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts . EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ . CONTACT d.t.jones@ucl.ac.uk.
Collapse
Affiliation(s)
- Daniel W A Buchan
- Department of Computer Science, University College London, Gower Street, London, UK
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, UK
| |
Collapse
|