1
|
Baker K, Hughes N, Bhattacharya S. An interactive visualization tool for educational outreach in protein contact map overlap analysis. FRONTIERS IN BIOINFORMATICS 2024; 4:1358550. [PMID: 38562910 PMCID: PMC10982686 DOI: 10.3389/fbinf.2024.1358550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Kevan Baker
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Nathaniel Hughes
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| | - Sutanu Bhattacharya
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| |
Collapse
|
2
|
Huang B, Kong L, Wang C, Ju F, Zhang Q, Zhu J, Gong T, Zhang H, Yu C, Zheng WM, Bu D. Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:913-925. [PMID: 37001856 PMCID: PMC10928435 DOI: 10.1016/j.gpb.2022.11.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/23/2022] [Accepted: 11/30/2022] [Indexed: 03/31/2023]
Abstract
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.
Collapse
Affiliation(s)
- Bin Huang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lupeng Kong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Changping Laboratory, Beijing 102206, China
| | - Chao Wang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Fusong Ju
- Microsoft Research AI4Science, Beijing 100080, China
| | - Qi Zhang
- Huawei Noah's Ark Lab, Wuhan 430206, China
| | - Jianwei Zhu
- Microsoft Research AI4Science, Beijing 100080, China
| | - Tiansu Gong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haicang Zhang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| | - Chungong Yu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.
| | - Dongbo Bu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| |
Collapse
|
3
|
Jamali Langeroudi A, Sabet MS, Jalali-Javaran M, Zamani K, Lohrasebi T, Malboobi MA. Functional assessment of AtPAP17; encoding a purple acid phosphatase involved in phosphate metabolism in Arabidopsis thaliana. Biotechnol Lett 2023; 45:719-739. [PMID: 37074554 DOI: 10.1007/s10529-023-03375-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 03/05/2023] [Accepted: 04/03/2023] [Indexed: 04/20/2023]
Abstract
PURPOSE Purple acid phosphatases (PAPs) includ the largest classes of non-specific plant acid phosphatases. Most characterized PAPs were found to play physiological functions in phosphorus metabolism. In this study, we investigated the function of AtPAP17 gene encoding an important purple acid phosphatase in Arabidopsis thaliana. METHODS The full-length cDNA sequence of AtPAP17 gene under the control of CaMV-35S promoter was transferred to the A. thaliana WT plant. The generated homozygote AtPAP17-overexpressed plants were compared by the types of analyses with corresponding homozygote atpap17-mutant plant and WT in both + P (1.2 mM) and - P (0 mM) conditions. RESULTS In the + P condition, the highest and the lowest amount of Pi was observed in AtPAP17-overexpressed plants and atpap17-mutant plants by 111% increase and 38% decrease compared with the WT plants, respectively. Furthermore, under the same condition, APase activity of AtPAP17-overexpressed plants increased by 24% compared to the WT. Inversely, atpap17-mutant plant represented a 71% fall compared to WT plants. The comparison of fresh weight and dry weight in the studied plants showed that the highest and the lowest amount of absorbed water belonged to OE plants (with 38 and 12 mg plant-1) and Mu plants (with 22 and 7 mg plant-1) in + P and - P conditions, respectively. CONCLUSION The lack of AtPAP17 gene in the A. thaliana genome led to a remarkable reduction in the development of root biomass. Thus, AtPAP17 could have an important role in the root but not shoot developmental and structural programming. Consequently, this function enables them to absorb more water and eventually associated with more phosphate absorption.
Collapse
Affiliation(s)
- Arash Jamali Langeroudi
- Department of Agricultural Biotechnology, Faculty of Agriculture, Tarbiat Modares University, P.O. Box 14115-336, Tehran, Iran
| | - Mohammad Sadegh Sabet
- Department of Plant Genetics and Breeding, Faculty of Agriculture, Tarbiat Modares University, P.O. Box 14115-336, Tehran, Iran.
| | - Mokhtar Jalali-Javaran
- Department of Agricultural Biotechnology, Faculty of Agriculture, Tarbiat Modares University, P.O. Box 14115-336, Tehran, Iran
| | - Katayoun Zamani
- Department of Genetic Engineering and Biosafety, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education, and Extension Organization, Karaj, Tehran, Iran
| | - Tahmineh Lohrasebi
- Department of Plant Biotechnology, National Institute of Genetic Engineering and Biotechnology, P.O. Box 14965-161, Tehran, Iran
| | - Mohammad Ali Malboobi
- Department of Plant Biotechnology, National Institute of Genetic Engineering and Biotechnology, P.O. Box 14965-161, Tehran, Iran
| |
Collapse
|
4
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
5
|
Homology Modeling and Analysis of Vacuolar Aspartyl Protease from a Novel Yeast Expression Host Meyerozyma guilliermondii Strain SO. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-022-07153-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
6
|
Lee SJ, Joo K, Sim S, Lee J, Lee IH, Lee J. CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27123711. [PMID: 35744836 PMCID: PMC9231382 DOI: 10.3390/molecules27123711] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 11/16/2022]
Abstract
Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign ≃42.94%) compared with that of HHalign (TM-HHalign ≃39.05%) and also that of MRFalign (TM-MRFalign ≃36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.
Collapse
Affiliation(s)
- Sung Jong Lee
- Basic Science Institute, Changwon National University, Changwon 51140, Korea;
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea;
| | | | - Juyong Lee
- Department of Chemistry, Kangwon National University, Chuncheon 24341, Korea;
| | - In-Ho Lee
- Korea Research Institute of Standards and Science (KRISS), Daejeon 34113, Korea;
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
- Correspondence:
| |
Collapse
|
7
|
Villegas-Morcillo A, Gomez AM, Sanchez V. An analysis of protein language model embeddings for fold prediction. Brief Bioinform 2022; 23:6571527. [PMID: 35443054 DOI: 10.1093/bib/bbac142] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/21/2022] [Accepted: 03/28/2022] [Indexed: 11/13/2022] Open
Abstract
The identification of the protein fold class is a challenging problem in structural biology. Recent computational methods for fold prediction leverage deep learning techniques to extract protein fold-representative embeddings mainly using evolutionary information in the form of multiple sequence alignment (MSA) as input source. In contrast, protein language models (LM) have reshaped the field thanks to their ability to learn efficient protein representations (protein-LM embeddings) from purely sequential information in a self-supervised manner. In this paper, we analyze a framework for protein fold prediction using pre-trained protein-LM embeddings as input to several fine-tuning neural network models, which are supervisedly trained with fold labels. In particular, we compare the performance of six protein-LM embeddings: the long short-term memory-based UniRep and SeqVec, and the transformer-based ESM-1b, ESM-MSA, ProtBERT and ProtT5; as well as three neural networks: Multi-Layer Perceptron, ResCNN-BGRU (RBG) and Light-Attention (LAT). We separately evaluated the pairwise fold recognition (PFR) and direct fold classification (DFC) tasks on well-known benchmark datasets. The results indicate that the combination of transformer-based embeddings, particularly those obtained at amino acid level, with the RBG and LAT fine-tuning models performs remarkably well in both tasks. To further increase prediction accuracy, we propose several ensemble strategies for PFR and DFC, which provide a significant performance boost over the current state-of-the-art results. All this suggests that moving from traditional protein representations to protein-LM embeddings is a very promising approach to protein fold-related tasks.
Collapse
Affiliation(s)
- Amelia Villegas-Morcillo
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| | - Angel M Gomez
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| | - Victoria Sanchez
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| |
Collapse
|
8
|
Byadi S, Oblak D, Kassmi Y, Sadik K, Hachim ME, Podlipnik Č, Aboulmouhajir A. In silico discovery of novel inhibitors from Northern African natural products database against main protease (Mpro) of SARS-CoV-2. J Biomol Struct Dyn 2022; 41:2900-2910. [PMID: 35168469 DOI: 10.1080/07391102.2022.2040594] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
The recent outbreak of COVID-19 (Coronavirus Disease 2019), caused by a novel SARS-CoV-2 virus, has led to public health emergencies worldwide where time is as important as equipment to save lives. Antimalarial drugs such as hydroxychloroquine and chloroquine derivatives are used in emergencies but they are not suitable for patients with high blood pressure, diabetes and heart problems. Since there are no approved drugs for this disease, science is challenged to find vaccines and new drugs. Therefore, as part of our Silico drug design strategy, we identified drug-like compounds that inhibit replication of the main protease (Mpro) of SARS-CoV-2 based on receptor-based virtual database screening, molecular docking, molecular dynamics, and drug-similarity profiling from the NANPDB natural products database available at North African. The two resulting hit compounds named 5- Chloro-Omega-hydroxy-1-O-methylemodin and cystodion E showed the highest binding energy with Mpro of SARS-CoV-2 and strong inhibitory activity compared with the previously published N3 inhibitor. The complexes of these two compounds were validated by molecular dynamics analysis (RMSD, RMSF, Rg, total number of hydrogen bonds and secondary structure fractions of the protein in the complex) as the best method to evaluate the biological stability of the system. Therefore, these molecules deserve more attention in drug development compared to COVID-19. HighlightsA large database of natural compounds was screened against nCoV-2's Mpro.Molecular docking and Molecular dynamics were used as powerful methods.Two compounds were found are very attractive to inhibit Mpro of nCoV-2.ADME-Tox profiling is evaluated the active compounds are not cancerogenic.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Said Byadi
- Extraction, Spectroscopy and Valorization Team, Organic synthesis, Extraction, and Valorization Laboratory, Sciences Faculty of Ain Chock, Hassan II University, Casablanca, Morocco.,Molecular Modeling and Spectroscopy Team, Sciences Faculty, Chouaib Doukkali University, El Jadida, Morocco
| | - Domen Oblak
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia
| | | | - Karima Sadik
- Molecular Modeling and Spectroscopy Team, Sciences Faculty, Chouaib Doukkali University, El Jadida, Morocco
| | - Mouhi Eddine Hachim
- Molecular Modeling and Spectroscopy Team, Sciences Faculty, Chouaib Doukkali University, El Jadida, Morocco
| | - Črtomir Podlipnik
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia
| | - Aziz Aboulmouhajir
- Extraction, Spectroscopy and Valorization Team, Organic synthesis, Extraction, and Valorization Laboratory, Sciences Faculty of Ain Chock, Hassan II University, Casablanca, Morocco.,Molecular Modeling and Spectroscopy Team, Sciences Faculty, Chouaib Doukkali University, El Jadida, Morocco
| |
Collapse
|
9
|
Kong L, Ju F, Zheng WM, Zhu J, Sun S, Xu J, Bu D. ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs. J Comput Biol 2022; 29:92-105. [PMID: 35073170 PMCID: PMC8892980 DOI: 10.1089/cmb.2021.0430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Template-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly related templates are available. Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build a structure model according to the alignment. Tested on three independent data sets with a total of 6688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods, including HHpred, CNFpred, CEthreader, and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.
Collapse
Affiliation(s)
- Lupeng Kong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Toyota Technological Institute, Chicago, Illinois, USA
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Wei-mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | | | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute, Chicago, Illinois, USA.,Address correspondence to: Prof. Jinbo Xu, Toyota Technological Institute, Chicago, IL 60637, USA
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Address correspondence to: Dr. Dongbo Bu, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
10
|
Bhattacharya S, Roche R, Moussad B, Bhattacharya D. DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins. Proteins 2022; 90:579-588. [PMID: 34599831 PMCID: PMC8738102 DOI: 10.1002/prot.26254] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 09/22/2021] [Accepted: 09/28/2021] [Indexed: 02/03/2023]
Abstract
Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact-assisted or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches, and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available at https://github.com/Bhattacharya-Lab/DisCovER.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
11
|
Ju F, Zhu J, Zhang Q, Wei G, Sun S, Zheng WM, Bu D. Seq-SetNet: directly exploiting multiple sequence alignment for protein secondary structure prediction. Bioinformatics 2022; 38:990-996. [PMID: 34849579 DOI: 10.1093/bioinformatics/btab777] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/22/2021] [Accepted: 11/04/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate prediction of protein structure relies heavily on exploiting multiple sequence alignment (MSA) for residue mutations and correlations as this information specifies protein tertiary structure. The widely used prediction approaches usually transform MSA into inter-mediate models, say position-specific scoring matrix or profile hidden Markov model. These inter-mediate models, however, cannot fully represent residue mutations and correlations carried by MSA; hence, an effective way to directly exploit MSAs is highly desirable. RESULTS Here, we report a novel sequence set network (called Seq-SetNet) to directly and effectively exploit MSA for protein structure prediction. Seq-SetNet uses an 'encoding and aggregation' strategy that consists of two key elements: (i) an encoding module that takes a component homologue in MSA as input, and encodes residue mutations and correlations into context-specific features for each residue; and (ii) an aggregation module to aggregate the features extracted from all component homologues, which are further transformed into structural properties for residues of the query protein. As Seq-SetNet encodes each homologue protein individually, it could consider both insertions and deletions, as well as long-distance correlations among residues, thus representing more information than the inter-mediate models. Moreover, the encoding module automatically learns effective features and thus avoids manual feature engineering. Using symmetric aggregation functions, Seq-SetNet processes the homologue proteins as a sequence set, making its prediction results invariable to the order of these proteins. On popular benchmark sets, we demonstrated the successful application of Seq-SetNet to predict secondary structure and torsion angles of residues with improved accuracy and efficiency. AVAILABILITY AND IMPLEMENTATION The code and datasets are available through https://github.com/fusong-ju/Seq-SetNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianwei Zhu
- Microsoft Research Asia, Beijing 100080, China
| | - Qi Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guozheng Wei
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Zhongke Big Data Academy, Zhengzhou 450046, Henan, China
| | - Wei-Mou Zheng
- University of Chinese Academy of Sciences, Beijing 100049, China.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Zhongke Big Data Academy, Zhengzhou 450046, Henan, China
| |
Collapse
|
12
|
Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022; 23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.
Collapse
Affiliation(s)
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| | - Ming Li
- University of Waterloo, Canada
| |
Collapse
|
13
|
New highly antigenic linear B cell epitope peptides from PvAMA-1 as potential vaccine candidates. PLoS One 2021; 16:e0258637. [PMID: 34727117 PMCID: PMC8562794 DOI: 10.1371/journal.pone.0258637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 10/01/2021] [Indexed: 11/19/2022] Open
Abstract
Peptide-based vaccines have demonstrated to be an important way to induce long-lived immune responses and, therefore, a promising strategy in the rational of vaccine development. As to malaria, among the classic vaccine targets, the Apical membrane antigen (AMA-1) was proven to have important B cell epitopes that can induce specific immune response and, hence, became key players for a vaccine approach. The peptides selection was carried out using a bioinformatic approach based on Hidden Markov Models profiles of known antigens and propensity scale methods based on hydrophilicity and secondary structure prediction. The antigenicity of the selected B-cell peptides was assessed by multiple serological assays using sera from acute P.vivax infected subjects. The synthetic peptides were recognized by 45.5%, 48.7% and 32.2% of infected subjects for peptides I, II and III respectively. Moreover, when synthetized together (tripeptide), the reactivity increases up to 62%, which is comparable to the reactivity found against the whole protein PvAMA-1 (57%). Furthermore, IgG reactivity against the tripeptide after depletion was reduced by 42%, indicating that these epitopes may be responsible for a considerable part of the protein immunogenicity. These results represent an excellent perspective regarding future chimeric vaccine constructions that may come to contemplate several targets with the potential to generate the robust and protective immune response that a vivax malaria vaccine needs to succeed.
Collapse
|
14
|
Villegas-Morcillo A, Gomez AM, Morales-Cordovilla JA, Sanchez V. Protein Fold Recognition From Sequences Using Convolutional and Recurrent Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2848-2854. [PMID: 32750896 DOI: 10.1109/tcbb.2020.3012732] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The identification of a protein fold type from its amino acid sequence provides important insights about the protein 3D structure. In this paper, we propose a deep learning architecture that can process protein residue-level features to address the protein fold recognition task. Our neural network model combines 1D-convolutional layers with gated recurrent unit (GRU) layers. The GRU cells, as recurrent layers, cope with the processing issues associated to the highly variable protein sequence lengths and so extract a fold-related embedding of fixed size for each protein domain. These embeddings are then used to perform the pairwise fold recognition task, which is based on transferring the fold type of the most similar template structure. We compare our model with several template-based and deep learning-based methods from the state-of-the-art. The evaluation results over the well-known LINDAHL and SCOP_TEST sets, along with a proposed LINDAHL test set updated to SCOP 1.75, show that our embeddings perform significantly better than these methods, specially at the fold level. Supplementary material, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2020.3012732, source code and trained models are available at http://sigmat.ugr.es/~amelia/CNN-GRU-RF+/.
Collapse
|
15
|
Villegas-Morcillo A, Sanchez V, Gomez AM. FoldHSphere: deep hyperspherical embeddings for protein fold recognition. BMC Bioinformatics 2021; 22:490. [PMID: 34641786 PMCID: PMC8507389 DOI: 10.1186/s12859-021-04419-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 09/29/2021] [Indexed: 12/01/2022] Open
Abstract
Background Current state-of-the-art deep learning approaches for protein fold recognition learn protein embeddings that improve prediction performance at the fold level. However, there still exists aperformance gap at the fold level and the (relatively easier) family level, suggesting that it might be possible to learn an embedding space that better represents the protein folds. Results In this paper, we propose the FoldHSphere method to learn a better fold embedding space through a two-stage training procedure. We first obtain prototype vectors for each fold class that are maximally separated in hyperspherical space. We then train a neural network by minimizing the angular large margin cosine loss to learn protein embeddings clustered around the corresponding hyperspherical fold prototypes. Our network architectures, ResCNN-GRU and ResCNN-BGRU, process the input protein sequences by applying several residual-convolutional blocks followed by a gated recurrent unit-based recurrent layer. Evaluation results on the LINDAHL dataset indicate that the use of our hyperspherical embeddings effectively bridges the performance gap at the family and fold levels. Furthermore, our FoldHSpherePro ensemble method yields an accuracy of 81.3% at the fold level, outperforming all the state-of-the-art methods. Conclusions Our methodology is efficient in learning discriminative and fold-representative embeddings for the protein domains. The proposed hyperspherical embeddings are effective at identifying the protein fold class by pairwise comparison, even when amino acid sequence similarities are low. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04419-7.
Collapse
Affiliation(s)
- Amelia Villegas-Morcillo
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain.
| | - Victoria Sanchez
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain
| | - Angel M Gomez
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain
| |
Collapse
|
16
|
Kong L, Ju F, Zhang H, Sun S, Bu D. FALCON2: a web server for high-quality prediction of protein tertiary structures. BMC Bioinformatics 2021; 22:439. [PMID: 34525939 PMCID: PMC8444573 DOI: 10.1186/s12859-021-04353-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/01/2021] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising. RESULTS In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches. CONCLUSIONS By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.
Collapse
Affiliation(s)
- Lupeng Kong
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Haicang Zhang
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| |
Collapse
|
17
|
Shen T, Wu J, Lan H, Zheng L, Pei J, Wang S, Liu W, Huang J. When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction). Proteins 2021; 89:1901-1910. [PMID: 34473376 DOI: 10.1002/prot.26232] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/16/2021] [Accepted: 08/20/2021] [Indexed: 12/29/2022]
Abstract
In this paper, we report our tFold framework's performance on the inter-residue contact prediction task in the 14th Critical Assessment of protein Structure Prediction (CASP14). Our tFold framework seamlessly combines both homologous sequences and structural decoys under an ultra-deep network architecture. Squeeze-excitation and axial attention mechanisms are employed to effectively capture inter-residue interactions. In CASP14, our best predictor achieves 41.78% in the averaged top-L precision for long-range contacts for all the 22 free-modeling (FM) targets, and ranked 1st among all the 60 participating teams. The tFold web server is now freely available at: https://drug.ai.tencent.com/console/en/tfold.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Wei Liu
- Tencent AI Lab, Shenzhen, China
| | | |
Collapse
|
18
|
Arsiccio A, Beavis J, Raut S, Coxon CH. FVIII inhibitors display FV-neutralizing activity in the prothrombin time assay. J Thromb Haemost 2021; 19:1907-1913. [PMID: 33914406 PMCID: PMC8360109 DOI: 10.1111/jth.15355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 04/16/2021] [Indexed: 11/28/2022]
Abstract
BACKGROUND The coagulation factors (F)V and VIII are homologous proteins that support hemostasis through their regulation of FX activity. Hemophilia A (HA) patients have reduced FVIII activity and a prolonged bleeding time that is corrected through the administration of exogenous FVIII. Around one-third of severe HA patients develop FVIII neutralizing antibodies, known as "inhibitors," which neutralize FVIII activity and preclude them from further FVIII therapy. OBJECTIVES We hypothesized that, based on the degree of homology between FV and FVIII (~40%), FVIII-neutralizing antibodies could cross react with FV. To test this hypothesis, a panel of recombinant, patient-derived, FVIII-neutralizing antibodies were screened for cross-reactivity against FV. METHODS Factor V and FVIII activity was measured using one-stage clotting assays; structural analysis was carried out using a structural approach. RESULTS We detected FV neutralizing activity with the anti-FVIII A2 domain antibody NB11B2. Because this antibody was derived from an HA inhibitor patient, FV-neutralizing activity was then evaluated in a number of HA inhibitor patient plasma samples; nine alloimmune samples had FV-neutralizing activity whereas no FV neutralizing activity was found in the two autoimmune samples available. We next examined the degree of surface homology between FV and FVIII and found that structural similarity could explain the cross reactivity of the anti-A2 antibody and likely accounts for the cross reactivity we observed in patient samples. CONCLUSIONS Although this novel observation is of interest, further work will be needed to determine whether FV neutralization in HA patient samples contributes to their bleeding diathesis.
Collapse
Affiliation(s)
- Andrea Arsiccio
- Department of Applied Science and TechnologyPolitecnico di TorinoTorinoItaly
| | - James Beavis
- Oxford Haemophilia CentreChurchill HospitalOxfordUK
| | - Sanj Raut
- National Institute for Biological Standards and ControlPotters BarUK
| | - Carmen H. Coxon
- National Institute for Biological Standards and ControlPotters BarUK
| |
Collapse
|
19
|
Bhattacharya S, Roche R, Shuvo MH, Bhattacharya D. Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading. Front Mol Biosci 2021; 8:643752. [PMID: 34046429 PMCID: PMC8148041 DOI: 10.3389/fmolb.2021.643752] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
- Department of Biological Sciences, Auburn University, Auburn, AL, United States
| |
Collapse
|
20
|
Wu F, Xu J. Deep template-based protein structure prediction. PLoS Comput Biol 2021; 17:e1008954. [PMID: 33939695 PMCID: PMC8118551 DOI: 10.1371/journal.pcbi.1008954] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 05/13/2021] [Accepted: 04/11/2021] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION Protein structure prediction has been greatly improved by deep learning, but most efforts are devoted to template-free modeling. But very few deep learning methods are developed for TBM (template-based modeling), a popular technique for protein structure prediction. TBM has been studied extensively in the past, but its accuracy is not satisfactory when highly similar templates are not available. RESULTS This paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. NDThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally, NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence coevolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results show that NDThreader greatly outperforms existing methods such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best average GDT score among all CASP14 servers on the 58 TBM targets.
Collapse
Affiliation(s)
- Fandi Wu
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
| |
Collapse
|
21
|
The Protective A673T Mutation of Amyloid Precursor Protein (APP) in Alzheimer's Disease. Mol Neurobiol 2021; 58:4038-4050. [PMID: 33914267 DOI: 10.1007/s12035-021-02385-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 04/05/2021] [Indexed: 10/21/2022]
Abstract
Alzheimer's disease is a progressive neurodegenerative disorder characterized by extracellular amyloid beta peptides and neurofibrillary tangles consisted of intracellular hyperphosphorylated Tau in the hippocampus and cerebral cortex. Most of the mutations in key genes that code for amyloid precursor protein can lead to significant accumulation of these peptides in the brain and cause Alzheimer's disease. Moreover, some point mutations in amyloid precursor protein can cause familial Alzheimer's disease, such as Swedish mutation (KM670/671NL) and A673V mutation. However, recent studies have found that the A673T mutation in amyloid precursor protein gene can protect against Alzheimer's disease, even if it is located next to the Swedish mutation (KM670/671NL) and at the same site as A673V mutation, which are pathogenic. It makes us curious about the protective A673T mutation. Here, we summarize the most recent insights of A673T mutation, focus on their roles in protective mechanisms against Alzheimer's disease, and discuss their involvement in future treatment.
Collapse
|
22
|
Peñaloza HF, Olonisakin TF, Bain WG, Qu Y, van der Geest R, Zupetic J, Hulver M, Xiong Z, Newstead MW, Zou C, Alder JK, Ybe JA, Standiford TJ, Lee JS. Thrombospondin-1 Restricts Interleukin-36γ-Mediated Neutrophilic Inflammation during Pseudomonas aeruginosa Pulmonary Infection. mBio 2021; 12:e03336-20. [PMID: 33824208 PMCID: PMC8092289 DOI: 10.1128/mbio.03336-20] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 02/25/2021] [Indexed: 01/05/2023] Open
Abstract
Interleukin-36γ (IL-36γ), a member of the IL-1 cytokine superfamily, amplifies lung inflammation and impairs host defense during acute pulmonary Pseudomonas aeruginosa infection. To be fully active, IL-36γ is cleaved at its N-terminal region by proteases such as neutrophil elastase (NE) and cathepsin S (CatS). However, it remains unclear whether limiting extracellular proteolysis restrains the inflammatory cascade triggered by IL-36γ during P. aeruginosa infection. Thrombospondin-1 (TSP-1) is a matricellular protein with inhibitory activity against NE and the pathogen-secreted Pseudomonas elastase LasB-both proteases implicated in amplifying inflammation. We hypothesized that TSP-1 tempers the inflammatory response during lung P. aeruginosa infection by inhibiting the proteolytic environment required for IL-36γ activation. Compared to wild-type (WT) mice, TSP-1-deficient (Thbs1-/-) mice exhibited a hyperinflammatory response in the lungs during P. aeruginosa infection, with increased cytokine production and an unrestrained extracellular proteolytic environment characterized by higher free NE and LasB, but not CatS activity. LasB cleaved IL-36γ proximally to M19 at a cleavage site distinct from those generated by NE and CatS, which cleave IL-36γ proximally to Y16 and S18, respectively. N-terminal truncation experiments in silico predicted that the M19 and the S18 isoforms bind the IL-36R complex almost identically. IL-36γ neutralization ameliorated the hyperinflammatory response and improved lung immunity in Thbs1-/- mice during P. aeruginosa infection. Moreover, administration of cleaved IL-36γ induced cytokine production and neutrophil recruitment and activation that was accentuated in Thbs1-/- mice lungs. Collectively, our data show that TSP-1 regulates lung neutrophilic inflammation and facilitates host defense by restraining the extracellular proteolytic environment required for IL-36γ activation.IMPORTANCEPseudomonas aeruginosa pulmonary infection can lead to exaggerated neutrophilic inflammation and tissue destruction, yet host factors that regulate the neutrophilic response are not fully known. IL-36γ is a proinflammatory cytokine that dramatically increases in bioactivity following N-terminal processing by proteases. Here, we demonstrate that thrombospondin-1, a host matricellular protein, limits N-terminal processing of IL-36γ by neutrophil elastase and the Pseudomonas aeruginosa-secreted protease LasB. Thrombospondin-1-deficient mice (Thbs1-/-) exhibit a hyperinflammatory response following infection. Whereas IL-36γ neutralization reduces inflammatory cytokine production, limits neutrophil activation, and improves host defense in Thbs1-/- mice, cleaved IL-36γ administration amplifies neutrophilic inflammation in Thbs1-/- mice. Our findings indicate that thrombospondin-1 guards against feed-forward neutrophilic inflammation mediated by IL-36γ in the lung by restraining the extracellular proteolytic environment.
Collapse
Affiliation(s)
- Hernán F Peñaloza
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Tolani F Olonisakin
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - William G Bain
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Yanyan Qu
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Rick van der Geest
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Jill Zupetic
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Mei Hulver
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Zeyu Xiong
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Michael W Newstead
- Pulmonary and Critical Care Medicine, Department of Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Chunbin Zou
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Jonathan K Alder
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Joel A Ybe
- Department of Environmental and Occupational Health, School of Public Health, Indiana University, Bloomington, Indiana, USA
| | - Theodore J Standiford
- Pulmonary and Critical Care Medicine, Department of Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Janet S Lee
- Acute Lung Injury Center of Excellence, Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
23
|
Zhang H, Shen Y. Template-based prediction of protein structure with deep learning. BMC Genomics 2020; 21:878. [PMID: 33372607 PMCID: PMC7771081 DOI: 10.1186/s12864-020-07249-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 11/18/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. RESULTS We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13's TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. CONCLUSIONS These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.
Collapse
Affiliation(s)
- Haicang Zhang
- Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA.
- Program in Mathematical Genomics, Columbia University, New York, NY, USA.
| |
Collapse
|
24
|
Mirzaei S, Razmara J, Lotfi S. GADP-align: A genetic algorithm and dynamic programming-based method for structural alignment of proteins. BIOIMPACTS 2020; 11:271-279. [PMID: 34631489 PMCID: PMC8494253 DOI: 10.34172/bi.2021.37] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 06/10/2020] [Accepted: 06/16/2020] [Indexed: 11/16/2022]
Abstract
![]()
Introduction: Similarity analysis of protein structure is considered as a fundamental step to give insight into the relationships between proteins. The primary step in structural alignment is looking for the optimal correspondence between residues of two structures to optimize the scoring function. An exhaustive search for finding such a correspondence between two structures is intractable.
Methods: In this paper, a hybrid method is proposed, namely GADP-align, for pairwise protein structure alignment. The proposed method looks for an optimal alignment using a hybrid method based on a genetic algorithm and an iterative dynamic programming technique. To this end, the method first creates an initial map of correspondence between secondary structure elements (SSEs) of two proteins. Then, a genetic algorithm combined with an iterative dynamic programming algorithm is employed to optimize the alignment.
Results: The GADP-align algorithm was employed to align 10 ‘difficult to align’ protein pairs in order to evaluate its performance. The experimental study shows that the proposed hybrid method produces highly accurate alignments in comparison with the methods using exactly the dynamic programming technique. Furthermore, the proposed method prevents the local optimal traps caused by the unsuitable initial guess of the corresponding residues.
Conclusion: The findings of this paper demonstrate that employing the genetic algorithm along with the dynamic programming technique yields highly accurate alignments between a protein pair by exploring the global alignment and avoiding trapping in local alignments.
Collapse
Affiliation(s)
- Soraya Mirzaei
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| | - Jafar Razmara
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| | - Shahriar Lotfi
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| |
Collapse
|
25
|
Xu J, Wang S. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins 2019; 87:1069-1081. [PMID: 31471916 DOI: 10.1002/prot.25810] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/24/2019] [Accepted: 08/27/2019] [Indexed: 12/30/2022]
Abstract
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
26
|
Abstract
Motivation Template-based modeling, including homology modeling and protein threading, is a popular method for protein 3D structure prediction. However, alignment generation and template selection for protein sequences without close templates remain very challenging. Results We present a new method called DeepThreader to improve protein threading, including both alignment generation and template selection, by making use of deep learning (DL) and residue co-variation information. Our method first employs DL to predict inter-residue distance distribution from residue co-variation and sequential information (e.g. sequence profile and predicted secondary structure), and then builds sequence-template alignment by integrating predicted distance information and sequential features through an ADMM algorithm. Experimental results suggest that predicted inter-residue distance is helpful to both protein alignment and template selection especially for protein sequences without very close templates, and that our method outperforms currently popular homology modeling method HHpred and threading method CNFpred by a large margin and greatly outperforms the latest contact-assisted protein threading method EigenTHREADER. Availability and implementation http://raptorx.uchicago.edu/ Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianwei Zhu
- Toyota Technological Institute, Chicago, IL, USA.,Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Sheng Wang
- Toyota Technological Institute, Chicago, IL, USA
| | - Dongbo Bu
- Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute, Chicago, IL, USA
| |
Collapse
|
27
|
Holt MC, Ho CS, Morano MI, Barrett SD, Stein AJ. Improved homology modeling of the human & rat EP 4 prostanoid receptors. BMC Mol Cell Biol 2019; 20:37. [PMID: 31455205 PMCID: PMC6712885 DOI: 10.1186/s12860-019-0212-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 07/11/2019] [Indexed: 12/02/2022] Open
Abstract
Background The EP4 prostanoid receptor is one of four GPCRs that mediate the diverse actions of prostaglandin E2 (PGE2). Novel selective EP4 receptor agonists would assist to further elucidate receptor sub-type function and promote development of therapeutics for bone healing, heart failure, and other receptor associated conditions. The rat EP4 (rEP4) receptor has been used as a surrogate for the human EP4 (hEP4) receptor in multiple SAR studies. To better understand the validity of this traditional approach, homology models were generated by threading for both receptors using the RaptorX server. These models were fit to an implicit membrane using the PPM server and OPM database with refinement of intra and extracellular loops by Prime (Schrödinger). To understand the interaction between the receptors and known agonists, induced-fit docking experiments were performed using Glide and Prime (Schrödinger), with both endogenous agonists and receptor sub-type selective, small-molecule agonists. The docking scores and observed interactions were compared with radioligand displacement experiments and receptor (rat & human) activation assays monitoring cAMP. Results Rank-ordering of in silico compound docking scores aligned well with in vitro activity assay EC50 and radioligand binding Ki. We observed variations between rat and human EP4 binding pockets that have implications in future small-molecule receptor-modulator design and SAR, specifically a S103G mutation within the rEP4 receptor. Additionally, these models helped identify key interactions between the EP4 receptor and ligands including PGE2 and several known sub-type selective agonists while serving as a marked improvement over the previously reported models. Conclusions This work has generated a set of novel homology models of the rEP4 and hEP4 receptors. The homology models provide an improvement upon the previously reported model, largely due to improved solvation. The hEP4 docking scores correlates best with the cAMP activation data, where both data sets rank order Rivenprost>CAY10684 > PGE1 ≈ PGE2 > 11-deoxy-PGE1 ≈ 11-dexoy-PGE2 > 8-aza-11-deoxy-PGE1. This rank-ordering matches closely with the rEP4 receptor as well. Species-specific differences were noted for the weak agonists Sulprostone and Misoprostol, which appear to dock more readily within human receptor versus rat receptor. Electronic supplementary material The online version of this article (10.1186/s12860-019-0212-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Melissa C Holt
- Cayman Chemical Co, 1180 E. Ellsworth Rd, Ann Arbor, MI, 48108, USA
| | - Chi S Ho
- Cayman Chemical Co, 1180 E. Ellsworth Rd, Ann Arbor, MI, 48108, USA
| | - M Inés Morano
- Cayman Chemical Co, 1180 E. Ellsworth Rd, Ann Arbor, MI, 48108, USA
| | | | - Adam J Stein
- Cayman Chemical Co, 1180 E. Ellsworth Rd, Ann Arbor, MI, 48108, USA.
| |
Collapse
|
28
|
Abstract
Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.
Collapse
|
29
|
Durr-e-Shahwar S, Atia-tul-Wahab, Choudhary MI, Jabeen A. Cloning, purification, structural, and functional characterization of methicillin-resistant Staphylococcus aureus (MRSA252) RsbV protein. Int J Biol Macromol 2019; 134:962-966. [DOI: 10.1016/j.ijbiomac.2019.05.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Revised: 05/04/2019] [Accepted: 05/05/2019] [Indexed: 02/04/2023]
|
30
|
Vizcaíno-Castillo A, Osorio-Méndez JF, Rubio-Ortiz M, Manning-Cela RG, Hernández R, Cevallos AM. Trypanosoma cruzi actins: Expression analysis of actin 2. Biochem Biophys Res Commun 2019; 513:347-353. [DOI: 10.1016/j.bbrc.2019.04.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 04/01/2019] [Indexed: 10/27/2022]
|
31
|
Bhattacharya S, Bhattacharya D. Does inclusion of residue-residue contact information boost protein threading? Proteins 2019; 87:596-606. [PMID: 30882932 DOI: 10.1002/prot.25684] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 02/20/2019] [Accepted: 03/13/2019] [Indexed: 12/26/2022]
Abstract
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| |
Collapse
|
32
|
Palomo-Ligas L, Gutiérrez-Gutiérrez F, Ochoa-Maganda VY, Cortés-Zárate R, Charles-Niño CL, Castillo-Romero A. Identification of a novel potassium channel (GiK) as a potential drug target in Giardia lamblia: Computational descriptions of binding sites. PeerJ 2019; 7:e6430. [PMID: 30834181 PMCID: PMC6397635 DOI: 10.7717/peerj.6430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 01/10/2019] [Indexed: 12/12/2022] Open
Abstract
Background The protozoan Giardia lamblia is the causal agent of giardiasis, one of the main diarrheal infections worldwide. Drug resistance to common antigiardial agents and incidence of treatment failures have increased in recent years. Therefore, the search for new molecular targets for drugs against Giardia infection is essential. In protozoa, ionic channels have roles in their life cycle, growth, and stress response. Thus, they are promising targets for drug design. The strategy of ligand-protein docking has demonstrated a great potential in the discovery of new targets and structure-based drug design studies. Methods In this work, we identify and characterize a new potassium channel, GiK, in the genome of Giardia lamblia. Characterization was performed in silico. Because its crystallographic structure remains unresolved, homology modeling was used to construct the three-dimensional model for the pore domain of GiK. The docking virtual screening approach was employed to determine whether GiK is a good target for potassium channel blockers. Results The GiK sequence showed 24–50% identity and 50–90% positivity with 21 different types of potassium channels. The quality assessment and validation parameters indicated the reliability of the modeled structure of GiK. We identified 110 potassium channel blockers exhibiting high affinity toward GiK. A total of 39 of these drugs bind in three specific regions. Discussion The GiK pore signature sequence is related to the small conductance calcium-activated potassium channels (SKCa). The predicted binding of 110 potassium blockers to GiK makes this protein an attractive target for biological testing to evaluate its role in the life cycle of Giardia lamblia and potential candidate for the design of novel antigiardial drugs.
Collapse
Affiliation(s)
- Lissethe Palomo-Ligas
- Departamento de Fisiología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Filiberto Gutiérrez-Gutiérrez
- Departamento de Química, Centro Universitario de Ciencias Exactas e Ingenierías, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Verónica Yadira Ochoa-Maganda
- Departamento de Fisiología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Rafael Cortés-Zárate
- Departamento de Microbiología y Patología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Claudia Lisette Charles-Niño
- Departamento de Microbiología y Patología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Araceli Castillo-Romero
- Departamento de Microbiología y Patología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| |
Collapse
|
33
|
Petegrosso R, Li Z, Srour MA, Saad Y, Zhang W, Kuang R. Scalable remote homology detection and fold recognition in massive protein networks. Proteins 2019; 87:478-491. [PMID: 30714638 DOI: 10.1002/prot.25669] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 12/19/2018] [Accepted: 01/31/2019] [Indexed: 11/10/2022]
Abstract
The global connectivities in very large protein similarity networks contain traces of evolution among the proteins for detecting protein remote evolutionary relations or structural similarities. To investigate how well a protein network captures the evolutionary information, a key limitation is the intensive computation of pairwise sequence similarities needed to construct very large protein networks. In this article, we introduce label propagation on low-rank kernel approximation (LP-LOKA) for searching massively large protein networks. LP-LOKA propagates initial protein similarities in a low-rank graph by Nyström approximation without computing all pairwise similarities. With scalable parallel implementations based on distributed-memory using message-passing interface and Apache-Hadoop/Spark on cloud, LP-LOKA can search protein networks with one million proteins or more. In the experiments on Swiss-Prot/ADDA/CASP data, LP-LOKA significantly improved protein ranking over the widely used HMM-HMM or profile-sequence alignment methods utilizing large protein networks. It was observed that the larger the protein similarity network, the better the performance, especially on relatively small protein superfamilies and folds. The results suggest that computing massively large protein network is necessary to meet the growing need of annotating proteins from newly sequenced species and LP-LOKA is both scalable and accurate for searching massively large protein networks.
Collapse
Affiliation(s)
- Raphael Petegrosso
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota
| | - Zhuliu Li
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota
| | - Molly A Srour
- McCormick School of Engineering, Northwestern University, Evanston, Illinois
| | - Yousef Saad
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, Florida
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota
| |
Collapse
|
34
|
Identification of the novel role of butyrate as AhR ligand in human intestinal epithelial cells. Sci Rep 2019; 9:643. [PMID: 30679727 PMCID: PMC6345974 DOI: 10.1038/s41598-018-37019-2] [Citation(s) in RCA: 105] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 11/28/2018] [Indexed: 12/18/2022] Open
Abstract
The ligand activated transcription factor, aryl hydrocarbon receptor (AhR) emerged as a critical regulator of immune and metabolic processes in the gastrointestinal tract. In the gut, a main source of AhR ligands derives from commensal bacteria. However, many of the reported microbiota-derived ligands have been restricted to indolyl metabolites. Here, by screening commensal bacteria supernatants on an AhR reporter system expressed in human intestinal epithelial cell line (IEC), we found that the short chain fatty acid (SCFA) butyrate induced AhR activity and the transcription of AhR-dependent genes in IECs. We showed that AhR ligand antagonists reduced the effects of butyrate on IEC suggesting that butyrate could act as a ligand of AhR, which was supported by the nuclear translocation of AhR induced by butyrate and in silico structural modelling. In conclusion, our findings suggest that (i) butyrate activates AhR pathway and AhR-dependent genes in human intestinal epithelial cell-lines (ii) butyrate is a potential ligand for AhR which is an original mechanism of gene regulation by SCFA.
Collapse
|
35
|
Riber L, Koch BM, Kruse LR, Germain E, Løbner-Olesen A. HipA-Mediated Phosphorylation of SeqA Does not Affect Replication Initiation in Escherichia coli. Front Microbiol 2018; 9:2637. [PMID: 30450091 PMCID: PMC6225831 DOI: 10.3389/fmicb.2018.02637] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 10/16/2018] [Indexed: 11/20/2022] Open
Abstract
The SeqA protein of Escherichia coli is required to prevent immediate re-initiation of chromosome replication from oriC. The SeqA protein is phosphorylated at the serine-36 (Ser36) residue by the HipA kinase. The role of phosphorylation was addressed by mutating the Ser36 residue to alanine, which cannot be phosphorylated and to aspartic acid, which mimics a phosphorylated serine residue. Both mutant strains were similar to wild-type with respect to origin concentration and initiation synchrony. The minimal time between successive initiations was also unchanged. We therefore suggest that SeqA phosphorylation at the Ser36 residue is silent, at least with respect to SeqA's role in replication initiation.
Collapse
Affiliation(s)
- Leise Riber
- Section for Functional Genomics, Department of Biology, Center for Bacterial Stress Response and Persistence, University of Copenhagen, Copenhagen, Denmark
- Leise Riber
| | - Birgit M. Koch
- Section for Functional Genomics, Department of Biology, Center for Bacterial Stress Response and Persistence, University of Copenhagen, Copenhagen, Denmark
| | - Line Riis Kruse
- Section for Functional Genomics, Department of Biology, Center for Bacterial Stress Response and Persistence, University of Copenhagen, Copenhagen, Denmark
| | - Elsa Germain
- Laboratoire de Chimie Bactérienne, Université Aix-Marseille, CNRS, Marseille, France
| | - Anders Løbner-Olesen
- Section for Functional Genomics, Department of Biology, Center for Bacterial Stress Response and Persistence, University of Copenhagen, Copenhagen, Denmark
- *Correspondence: Anders Løbner-Olesen
| |
Collapse
|
36
|
Skotnicová P, Sobotka R, Shepherd M, Hájek J, Hrouzek P, Tichý M. The cyanobacterial protoporphyrinogen oxidase HemJ is a new b-type heme protein functionally coupled with coproporphyrinogen III oxidase. J Biol Chem 2018; 293:12394-12404. [PMID: 29925590 DOI: 10.1074/jbc.ra118.003441] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 06/14/2018] [Indexed: 12/27/2022] Open
Abstract
Protoporphyrinogen IX oxidase (PPO), the last enzyme that is common to both chlorophyll and heme biosynthesis pathways, catalyzes the oxidation of protoporphyrinogen IX to protoporphyrin IX. PPO has several isoforms, including the oxygen-dependent HemY and an oxygen-independent enzyme, HemG. However, most cyanobacteria encode HemJ, the least characterized PPO form. We have characterized HemJ from the cyanobacterium Synechocystis sp. PCC 6803 (Synechocystis 6803) as a bona fide PPO; HemJ down-regulation resulted in accumulation of tetrapyrrole precursors and in the depletion of chlorophyll precursors. The expression of FLAG-tagged Synechocystis 6803 HemJ protein (HemJ.f) and affinity isolation of HemJ.f under native conditions revealed that it binds heme b The most stable HemJ.f form was a dimer, and higher oligomeric forms were also observed. Using both oxygen and artificial electron acceptors, we detected no enzymatic activity with the purified HemJ.f, consistent with the hypothesis that the enzymatic mechanism for HemJ is distinct from those of other PPO isoforms. The heme absorption spectra and distant HemJ homology to several membrane oxidases indicated that the heme in HemJ is redox-active and involved in electron transfer. HemJ was conditionally complemented by another PPO, HemG from Escherichia coli. If grown photoautotrophically, the complemented strain accumulated tripropionic tetrapyrrole harderoporphyrin, suggesting a defect in enzymatic conversion of coproporphyrinogen III to protoporphyrinogen IX, catalyzed by coproporphyrinogen III oxidase (CPO). This observation supports the hypothesis that HemJ is functionally coupled with CPO and that this coupling is disrupted after replacement of HemJ by HemG.
Collapse
Affiliation(s)
- Petra Skotnicová
- From the Czech Academy of Sciences, Institute of Microbiology, Centre Algatech, 379 81 Třeboň, Czech Republic.,the Faculty of Science, University of South Bohemia, 370 05 České Budějovice, Czech Republic, and
| | - Roman Sobotka
- From the Czech Academy of Sciences, Institute of Microbiology, Centre Algatech, 379 81 Třeboň, Czech Republic.,the Faculty of Science, University of South Bohemia, 370 05 České Budějovice, Czech Republic, and
| | - Mark Shepherd
- the School of Biosciences, RAPID Group, University of Kent, Canterbury CT2 7NZ,United Kingdom
| | - Jan Hájek
- From the Czech Academy of Sciences, Institute of Microbiology, Centre Algatech, 379 81 Třeboň, Czech Republic.,the Faculty of Science, University of South Bohemia, 370 05 České Budějovice, Czech Republic, and
| | - Pavel Hrouzek
- From the Czech Academy of Sciences, Institute of Microbiology, Centre Algatech, 379 81 Třeboň, Czech Republic.,the Faculty of Science, University of South Bohemia, 370 05 České Budějovice, Czech Republic, and
| | - Martin Tichý
- From the Czech Academy of Sciences, Institute of Microbiology, Centre Algatech, 379 81 Třeboň, Czech Republic, .,the Faculty of Science, University of South Bohemia, 370 05 České Budějovice, Czech Republic, and
| |
Collapse
|
37
|
Morales-Cordovilla JA, Sanchez V, Ratajczak M. Protein alignment based on higher order conditional random fields for template-based modeling. PLoS One 2018; 13:e0197912. [PMID: 29856860 PMCID: PMC5983487 DOI: 10.1371/journal.pone.0197912] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 05/10/2018] [Indexed: 11/19/2022] Open
Abstract
The query-template alignment of proteins is one of the most critical steps of template-based modeling methods used to predict the 3D structure of a query protein. This alignment can be interpreted as a temporal classification or structured prediction task and first order Conditional Random Fields have been proposed for protein alignment and proven to be rather successful. Some other popular structured prediction problems, such as speech or image classification, have gained from the use of higher order Conditional Random Fields due to the well known higher order correlations that exist between their labels and features. In this paper, we propose and describe the use of higher order Conditional Random Fields for query-template protein alignment. The experiments carried out on different public datasets validate our proposal, especially on distantly-related protein pairs which are the most difficult to align.
Collapse
Affiliation(s)
| | - Victoria Sanchez
- Dept. of Teoría de la Señal Telemática y Comunicaciones, Universidad de Granada, Granada, Spain
| | - Martin Ratajczak
- Graz University of Technology, Signal Processing and Speech Communication Laboratory, Graz, Austria
| |
Collapse
|
38
|
Abdel Azim A, Rittmann SKMR, Fino D, Bochmann G. The physiological effect of heavy metals and volatile fatty acids on Methanococcus maripaludis S2. BIOTECHNOLOGY FOR BIOFUELS 2018; 11:301. [PMID: 30410576 PMCID: PMC6214177 DOI: 10.1186/s13068-018-1302-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 10/25/2018] [Indexed: 05/16/2023]
Abstract
BACKGROUND Methanogenic archaea are of importance to the global C-cycle and to biological methane (CH4) production through anaerobic digestion and pure culture. Here, the individual and combined effects of copper (Cu), zinc (Zn), acetate, and propionate on the metabolism of the autotrophic, hydrogenotrophic methanogen Methanococcus maripaludis S2 were investigated. Cu, Zn, acetate, and propionate may interfere directly and indirectly with the acetyl-CoA synthesis and biological CH4 production. Thus, these compounds can compromise or improve the performance of M. maripaludis, an organism which can be applied as biocatalyst in the carbon dioxide (CO2)-based biological CH4 production (CO2-BMP) process or of methanogenic organisms applied in anaerobic digestion. RESULTS Here, we show that Cu concentration of 1.9 µmol L-1 reduced growth of M. maripaludis, whereas 4.4 and 6.3 µmol L-1 of Cu even further retarded biomass production. However, 1.0 mmol L-1 of Zn enhanced growth, but at Zn concentrations > 2.4 mmol L-1 no growth could be observed. When both, Cu and Zn, were supplemented to the medium, growth and CH4 production could even be observed at the highest tested concentration of Cu (6.3 µmol L-1). Hence, it seems that the addition of 1 mmol L-1 of Zn enhanced the ability of M. maripaludis to counteract the toxic effect of Cu. The physiological effect to rising concentrations of acetate (12.2, 60.9, 121.9 mmol L-1) and/or propionate (10.3, 52.0, 104.1 mmol L-1) was also investigated. When instead of acetate 10.3 mmol L-1 propionate was provided in the growth medium, M. maripaludis could grow without reduction of the specific growth rate (µ) or the specific CH4 productivity (qCH4). A combination of inorganic and/or organic compounds resulted in an increase of µ and qCH4 for Zn/Cu and Zn/acetate beyond the values that were observed if only the individual concentrations of Zn, Cu, acetate were used. CONCLUSIONS Our study sheds light on the physiological effect of VFAs and heavy metals on M. maripaludis. Differently from µ and qCH4, MER was not influenced by the presence of these compounds. This indicated that each of these compounds directly interacted with the C-fixation machinery of M. maripaludis. Until now, the uptake of VFAs other than acetate was not considered to enhance growth and CH4 production of methanogens. The finding of propionate uptake by M. maripaludis is important for the interpretation of VFA cycling in anaerobic microenvironments. Due to the importance of methanogens in natural and artificial anaerobic environments, our results help to enhance the understanding the physiological and biotechnological importance with respect to anaerobic digestion, anaerobic wastewater treatment, and CO2-BMP. Finally, we propose a possible mechanism for acetate uptake into M. maripaludis supported by in silico analyses.
Collapse
Affiliation(s)
- Annalisa Abdel Azim
- Institute for Environmental Biotechnology, IFA Department Tulln, University of Natural Resources and Life Sciences, Vienna, Austria
- Archaea Physiology & Biotechnology Group, Archaea Biology and Ecogenomics Division, Department of Ecogenomics and Systems Biology, Universität Wien, Althanstraße 14, 1090 Vienna, Austria
- Department of Applied Science and Technology (DISAT), Politecnico di Torino, Turin, Italy
- Center for Sustainable Future Technologies, Istituto Italiano di Tecnologia, Turin, Italy
| | - Simon K.-M. R. Rittmann
- Archaea Physiology & Biotechnology Group, Archaea Biology and Ecogenomics Division, Department of Ecogenomics and Systems Biology, Universität Wien, Althanstraße 14, 1090 Vienna, Austria
| | - Debora Fino
- Department of Applied Science and Technology (DISAT), Politecnico di Torino, Turin, Italy
| | - Günther Bochmann
- Institute for Environmental Biotechnology, IFA Department Tulln, University of Natural Resources and Life Sciences, Vienna, Austria
| |
Collapse
|
39
|
Zhu J, Zhang H, Li SC, Wang C, Kong L, Sun S, Zheng WM, Bu D. Improving protein fold recognition by extracting fold-specific features from predicted residue–residue contacts. Bioinformatics 2017; 33:3749-3757. [DOI: 10.1093/bioinformatics/btx514] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 08/09/2017] [Indexed: 01/05/2023] Open
Affiliation(s)
- Jianwei Zhu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Chao Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Lupeng Kong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
40
|
Middleton SA, Illuminati J, Kim J. Complete fold annotation of the human proteome using a novel structural feature space. Sci Rep 2017; 7:46321. [PMID: 28406174 PMCID: PMC5390313 DOI: 10.1038/srep46321] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Accepted: 03/14/2017] [Indexed: 11/11/2022] Open
Abstract
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.
Collapse
Affiliation(s)
- Sarah A Middleton
- Genomics and Computational Biology Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Joseph Illuminati
- Department of Computer Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Junhyong Kim
- Genomics and Computational Biology Program, University of Pennsylvania, Philadelphia, PA 19104, USA.,Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
41
|
Vaitinadapoule A, Etchebest C. Molecular Modeling of Transporters: From Low Resolution Cryo-Electron Microscopy Map to Conformational Exploration. The Example of TSPO. Methods Mol Biol 2017; 1635:383-416. [PMID: 28755381 DOI: 10.1007/978-1-4939-7151-0_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This chapter describes a protocol to establish a three-dimensional (3D) model of a protein and to explore its conformational landscape. It combines predictions from up-to-date bioinformatics methods with low-resolution experimental data. It also proposes to examine rapidly the dynamics of the protein using molecular dynamics simulations with a coarse-grained force field. Tools for analyzing these trajectories are suggested as well as those for constructing all-atoms models. Thus, starting from a protein sequence and using free software, the user can get important conformational information, which might improve the knowledge about the protein function.
Collapse
Affiliation(s)
- Aurore Vaitinadapoule
- Unité INSERM UMRS1134, Laboratory of Excellence, Institut National de la Transfusion Sanguine, Université Paris-Diderot, Sorbonne Paris Cité, Université de la Réunion, 6 rue Alexandre Cabanel, 75015, Paris Cedex 15, France
| | - Catherine Etchebest
- Unité INSERM UMRS1134, Laboratory of Excellence, Institut National de la Transfusion Sanguine, Université Paris-Diderot, Sorbonne Paris Cité, Université de la Réunion, 6 rue Alexandre Cabanel, 75015, Paris Cedex 15, France.
| |
Collapse
|
42
|
Cui X, Lu Z, Wang S, Jing-Yan Wang J, Gao X. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics 2016; 32:i332-i340. [PMID: 27307635 PMCID: PMC4908355 DOI: 10.1093/bioinformatics/btw271] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
MOTIVATION Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. METHOD We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. RESULTS We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. AVAILABILITY AND IMPLEMENTATION Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx CONTACT : xin.gao@kaust.edu.sa SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xuefeng Cui
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| | - Zhiwu Lu
- Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing 100872, China
| | - Sheng Wang
- Toyota Technological Institute at Chicago, 6045 Kenwood Avenue, Chicago, IL 60637, USA Department of Human Genetics, University of Chicago, E. 58th St, Chicago, IL 60637, USA
| | - Jim Jing-Yan Wang
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
43
|
Wang S, Li W, Liu S, Xu J. RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res 2016; 44:W430-5. [PMID: 27112573 PMCID: PMC4987890 DOI: 10.1093/nar/gkw306] [Citation(s) in RCA: 331] [Impact Index Per Article: 41.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 04/12/2016] [Indexed: 11/14/2022] Open
Abstract
RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence–structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL, USA Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Wei Li
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Zhejiang, China
| | - Shiwang Liu
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Zhejiang, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| |
Collapse
|
44
|
Lhota J, Xie L. Protein-fold recognition using an improved single-source K diverse shortest paths algorithm. Proteins 2016; 84:467-72. [PMID: 26800480 DOI: 10.1002/prot.24993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Revised: 01/10/2016] [Accepted: 01/12/2016] [Indexed: 11/11/2022]
Abstract
Protein structure prediction, when construed as a fold recognition problem, is one of the most important applications of similarity search in bioinformatics. A new protein-fold recognition method is reported which combines a single-source K diverse shortest path (SSKDSP) algorithm with Enrichment of Network Topological Similarity (ENTS) algorithm to search a graphic feature space generated using sequence similarity and structural similarity metrics. A modified, more efficient SSKDSP algorithm is developed to improve the performance of graph searching. The new implementation of the SSKDSP algorithm empirically requires 82% less memory and 61% less time than the current implementation, allowing for the analysis of larger, denser graphs. Furthermore, the statistical significance of fold ranking generated from SSKDSP is assessed using ENTS. The reported ENTS-SSKDSP algorithm outperforms original ENTS that uses random walk with restart for the graph search as well as other state-of-the-art protein structure prediction algorithms HHSearch and Sparks-X, as evaluated by a benchmark of 600 query proteins. The reported methods may easily be extended to other similarity search problems in bioinformatics and chemoinformatics. The SSKDSP software is available at http://compsci.hunter.cuny.edu/~leixie/sskdsp.html.
Collapse
Affiliation(s)
| | - Lei Xie
- Department of Computer Science, Hunter College, the Graduate Center, the City University of New York, New York
| |
Collapse
|
45
|
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep 2016; 6:18962. [PMID: 26752681 PMCID: PMC4707437 DOI: 10.1038/srep18962] [Citation(s) in RCA: 255] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 11/26/2015] [Indexed: 12/29/2022] Open
Abstract
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Collapse
|
46
|
Tong J, Pei J, Grishin NV. SFESA: a web server for pairwise alignment refinement by secondary structure shifts. BMC Bioinformatics 2015; 16:282. [PMID: 26335387 PMCID: PMC4558796 DOI: 10.1186/s12859-015-0711-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 08/19/2015] [Indexed: 12/01/2022] Open
Abstract
Background Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. Results We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. Conclusions SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
Collapse
Affiliation(s)
- Jing Tong
- Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| | - Nick V Grishin
- Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA. .,Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| |
Collapse
|
47
|
AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model. BIOMED RESEARCH INTERNATIONAL 2015; 2015:678764. [PMID: 26339631 PMCID: PMC4538422 DOI: 10.1155/2015/678764] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2014] [Accepted: 03/11/2015] [Indexed: 12/14/2022]
Abstract
Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods.
Collapse
|
48
|
DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields. Int J Mol Sci 2015; 16:17315-30. [PMID: 26230689 PMCID: PMC4581195 DOI: 10.3390/ijms160817315] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 07/15/2015] [Accepted: 07/16/2015] [Indexed: 12/14/2022] Open
Abstract
Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors.
Collapse
|
49
|
Kozma D, Tusnády GE. TMFoldRec: a statistical potential-based transmembrane protein fold recognition tool. BMC Bioinformatics 2015; 16:201. [PMID: 26123059 PMCID: PMC4486421 DOI: 10.1186/s12859-015-0638-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 06/06/2015] [Indexed: 12/26/2022] Open
Abstract
Background Transmembrane proteins (TMPs) are the key components of signal transduction, cell-cell adhesion and energy and material transport into and out from the cells. For the deep understanding of these processes, structure determination of transmembrane proteins is indispensable. However, due to technical difficulties, only a few transmembrane protein structures have been determined experimentally. Large-scale genomic sequencing provides increasing amounts of sequence information on the proteins and whole proteomes of living organisms resulting in the challenge of bioinformatics; how the structural information should be gained from a sequence. Results Here, we present a novel method, TMFoldRec, for fold prediction of membrane segments in transmembrane proteins. TMFoldRec based on statistical potentials was tested on a benchmark set containing 124 TMP chains from the PDBTM database. Using a 10-fold jackknife method, the native folds were correctly identified in 77 % of the cases. This accuracy overcomes the state-of-the-art methods. In addition, a key feature of TMFoldRec algorithm is the ability to estimate the reliability of the prediction and to decide with an accuracy of 70 %, whether the obtained, lowest energy structure is the native one. Conclusion These results imply that the membrane embedded parts of TMPs dictate the TM structures rather than the soluble parts. Moreover, predictions with reliability scores make in this way our algorithm applicable for proteome-wide analyses. Availability The program is available upon request for academic use. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0638-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dániel Kozma
- "Momentum" Membrane Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, PO Box 7, , H 1518, Budapest, Hungary.
| | - Gábor E Tusnády
- "Momentum" Membrane Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, PO Box 7, , H 1518, Budapest, Hungary.
| |
Collapse
|
50
|
Bawono P, van der Velde A, Abeln S, Heringa J. Quantifying the displacement of mismatches in multiple sequence alignment benchmarks. PLoS One 2015; 10:e0127431. [PMID: 25993129 PMCID: PMC4438059 DOI: 10.1371/journal.pone.0127431] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/14/2015] [Indexed: 11/18/2022] Open
Abstract
Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.
Collapse
Affiliation(s)
- Punto Bawono
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
- * E-mail: (PB); (JH)
| | - Arjan van der Velde
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Sanne Abeln
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
- Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, Amsterdam, The Netherlands
- * E-mail: (PB); (JH)
| |
Collapse
|