1
|
Ghafarollahi A, Buehler MJ. ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning. DIGITAL DISCOVERY 2024; 3:1389-1409. [PMID: 38993729 PMCID: PMC11235180 DOI: 10.1039/d4dd00013g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 05/13/2024] [Indexed: 07/13/2024]
Abstract
Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data - natural vibrational frequencies - via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.
Collapse
Affiliation(s)
- Alireza Ghafarollahi
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
| |
Collapse
|
2
|
Kamble A, Singh R, Singh H. Structural and Functional Characterization of Obesumbacterium proteus Phytase: A Comprehensive In-Silico Study. Mol Biotechnol 2024:10.1007/s12033-024-01069-x. [PMID: 38393631 DOI: 10.1007/s12033-024-01069-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 01/09/2024] [Indexed: 02/25/2024]
Abstract
Phytate, also known as myoinositol hexakisphosphate, exhibits anti-nutritional properties and possesses a negative environmental impact. Phytase enzymes break down phytate, showing potential in various industries, necessitating thorough biochemical and computational characterizations. The present study focuses on Obesumbacterium proteus phytase (OPP), indicating its similarities with known phytases and its potential through computational analyses. Structure, functional, and docking results shed light on OPP's features, structural stability, strong and stable interaction, and dynamic conformation, with flexible sidechains that could adapt to different temperatures or specific functions. Root Mean Square fluctuation (RMSF) highlighted fluctuating regions in OPP, indicating potential sites for stability enhancement through mutagenesis. The systematic approach developed here could aid in enhancing enzyme properties via a rational engineering approach. Computational analysis expedites enzyme discovery and engineering, complementing the traditional biochemical methods to accelerate the quest for superior enzymes for industrial applications.
Collapse
Affiliation(s)
- Asmita Kamble
- Department of Biological Sciences, Sunandan Divatia School of Science, NMIMS Deemed to be University, Vile Parle (W), Mumbai, Maharashtra, India
| | - Rajkumar Singh
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Station 19, Lausanne, Switzerland
- Division of Physiological Chemistry II, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 17177, Stockholm, Sweden
| | - Harinder Singh
- Department of Biological Sciences, Sunandan Divatia School of Science, NMIMS Deemed to be University, Vile Parle (W), Mumbai, Maharashtra, India.
| |
Collapse
|
3
|
Ni B, Kaplan DL, Buehler MJ. ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a language diffusion model. SCIENCE ADVANCES 2024; 10:eadl4000. [PMID: 38324676 PMCID: PMC10849601 DOI: 10.1126/sciadv.adl4000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/08/2024] [Indexed: 02/09/2024]
Abstract
Through evolution, nature has presented a set of remarkable protein materials, including elastins, silks, keratins and collagens with superior mechanical performances that play crucial roles in mechanobiology. However, going beyond natural designs to discover proteins that meet specified mechanical properties remains challenging. Here, we report a generative model that predicts protein designs to meet complex nonlinear mechanical property-design objectives. Our model leverages deep knowledge on protein sequences from a pretrained protein language model and maps mechanical unfolding responses to create proteins. Via full-atom molecular simulations for direct validation, we demonstrate that the designed proteins are de novo, and fulfill the targeted mechanical properties, including unfolding energy and mechanical strength, as well as the detailed unfolding force-separation curves. Our model offers rapid pathways to explore the enormous mechanobiological protein sequence space unconstrained by biological synthesis, using mechanical features as the target to enable the discovery of protein materials with superior mechanical properties.
Collapse
Affiliation(s)
- Bo Ni
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
| | - David L. Kaplan
- Department of Biomedical Engineering, Tufts University, Medford, MA 02155, USA
| | - Markus J. Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
| |
Collapse
|
4
|
Broz M, Jukič M, Bren U. Naive Prediction of Protein Backbone Phi and Psi Dihedral Angles Using Deep Learning. Molecules 2023; 28:7046. [PMID: 37894526 PMCID: PMC10609058 DOI: 10.3390/molecules28207046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/06/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023] Open
Abstract
Protein structure prediction represents a significant challenge in the field of bioinformatics, with the prediction of protein structures using backbone dihedral angles recently achieving significant progress due to the rise of deep neural network research. However, there is a trend in protein structure prediction research to employ increasingly complex neural networks and contributions from multiple models. This study, on the other hand, explores how a single model transparently behaves using sequence data only and what can be expected from the predicted angles. To this end, the current paper presents data acquisition, deep learning model definition, and training toward the final protein backbone angle prediction. The method applies a simple fully connected neural network (FCNN) model that takes only the primary structure of the protein with a sliding window of size 21 as input to predict protein backbone ϕ and ψ dihedral angles. Despite its simplicity, the model shows surprising accuracy for the ϕ angle prediction and somewhat lower accuracy for the ψ angle prediction. Moreover, this study demonstrates that protein secondary structure prediction is also possible with simple neural networks that take in only the protein amino-acid residue sequence, but more complex models are required for higher accuracies.
Collapse
Affiliation(s)
- Matic Broz
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
| | - Marko Jukič
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška ulica 8, SI-6000 Koper, Slovenia
- Institute of Environmental Protection and Sensors, Beloruska ulica 7, SI-2000 Maribor, Slovenia
| | - Urban Bren
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška ulica 8, SI-6000 Koper, Slovenia
- Institute of Environmental Protection and Sensors, Beloruska ulica 7, SI-2000 Maribor, Slovenia
| |
Collapse
|
5
|
Ni B, Kaplan DL, Buehler MJ. Generative design of de novo proteins based on secondary structure constraints using an attention-based diffusion model. Chem 2023; 9:1828-1849. [PMID: 37614363 PMCID: PMC10443900 DOI: 10.1016/j.chempr.2023.03.020] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
We report two generative deep learning models that predict amino acid sequences and 3D protein structures based on secondary structure design objectives via either overall content or per-residue structure. Both models are robust regarding imperfect inputs and offer de novo design capacity as they can discover new protein sequences not yet discovered from natural mechanisms or systems. The residue-level secondary structure design model generally yields higher accuracy and more diverse sequences. These findings suggest unexplored opportunities for protein designs and functional outcomes within the vast amino acid sequences beyond known proteins. Our models, based on an attention-based diffusion model and trained on a dataset extracted from experimentally known 3D protein structures, offer numerous downstream applications in conditional generative design of various biological or engineering systems. Future work may include additional conditioning, and an exploration of other functional properties of the generated proteins for various properties beyond structural objectives.
Collapse
Affiliation(s)
- Bo Ni
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
| | - David L. Kaplan
- Department of Biomedical Engineering, Tufts University, Medford, MA 02155, USA
| | - Markus J. Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
- Lead contact
| |
Collapse
|
6
|
Yang R, Liu J, Zhang L. ECAmyloid: An amyloid predictor based on ensemble learning and comprehensive sequence-derived features. Comput Biol Chem 2023; 104:107853. [PMID: 36990028 DOI: 10.1016/j.compbiolchem.2023.107853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/17/2023] [Accepted: 03/20/2023] [Indexed: 03/30/2023]
Abstract
Amyloid fibrils formed by the mis-aggregation of amyloid proteins can lead to neuronal degenerations in the Alzheimer's disease. Predicting amyloid proteins not only contributes to understanding physicochemical properties and formation mechanism of amyloid proteins, but also has significant implications in the amyloid disease treatment and the development of a new purpose for amyloid materials. In this study, an ensemble learning model with sequence-derived features, ECAmyloid, is proposed to identify amyloids. The sequence-derived features including Pseudo Position Specificity Score Matrix (Pse-PSSM), Split Amino Acid Composition (SAAC), Solvent Accessibility (SA), and Secondary Structure Information (SSI) are employed to incorporate sequence composition, evolutionary and structural information. The individual learners of the ensemble learning model are selected by an increment classifier selection strategy. The final prediction results are determined by voting of prediction results of multiple individual learners. In view of the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted to generate positive samples. To eliminate irrelevant features and redundant features, correlation-based feature subset (CFS) selection combined with a heuristic search strategy is performed to obtain the optimal feature subset. Experimental results indicate that the ensemble classifier achieves an accuracy of 98.29%, a sensitivity of 0.992, a specificity of 0.974 on the training dataset using the 10-fold cross validation, far higher than the results obtained by its individual learners. Compared with the original feature set, the accuracy, sensitivity, specificity, MCC, F1-score, G-Mean of the ensemble method trained by the optimal feature subset are improved by 1.05%, 0.012, 0.01, 0.021, 0.011 and 0.011, respectively. Moreover, the comparison results with existing methods on two same independent test datasets demonstrate that the proposed method is an effective and promising predictor for large-scale determination of amyloid proteins. The data and code used to develop ECAmyloid has been shared to Github, and can be freely downloaded at https://github.com/KOALA-L/ECAmyloid.git.
Collapse
Affiliation(s)
- Runtao Yang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China
| | - Jiaming Liu
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China
| | - Lina Zhang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| |
Collapse
|
7
|
S. G, E.R. V. Protein secondary structure prediction using Cascaded Feature Learning Model. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
|
8
|
Rashid S, Sundaram S, Kwoh CK. Empirical Study of Protein Feature Representation on Deep Belief Networks Trained With Small Data for Secondary Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:955-966. [PMID: 35439138 DOI: 10.1109/tcbb.2022.3168676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein secondary structure (SS) prediction is a classic problem of computational biology and is widely used in structural characterization and to infer homology. While most SS predictors have been trained on thousands of sequences, a previous approach had developed a compact model of training proteins that used a C-Alpha, C-Beta Side Chain (CABS)-algorithm derived energy based feature representation. Here, the previous approach is extended to Deep Belief Networks (DBN). Deep learning methods are notorious for requiring large datasets and there is a wide consensus that training deep models from scratch on small datasets, works poorly. By contrast, we demonstrate a simple DBN architecture containing a single hidden layer, trained only on the CB513 dataset. Testing on an independent set of G Switch proteins improved the Q 3 score of the previous compact model by almost 3%. The findings are further confirmed by comparison to several deep learning models which are trained on thousands of proteins. Finally, the DBN performance is also compared with Position Specific Scoring Matrix (PSSM)-profile based feature representation. The importance of (i) structural information in protein feature representation and (ii) complementary small dataset learning approaches for detection of structural fold switching are demonstrated.
Collapse
|
9
|
Gormez Y, Aydin Z. IGPRED-MultiTask: A Deep Learning Model to Predict Protein Secondary Structure, Torsion Angles and Solvent Accessibility. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1104-1113. [PMID: 35849663 DOI: 10.1109/tcbb.2022.3191395] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein secondary structure, solvent accessibility and torsion angle predictions are preliminary steps to predict 3D structure of a protein. Deep learning approaches have achieved significant improvements in predicting various features of protein structure. In this study, IGPRED-Multitask, a deep learning model with multi task learning architecture based on deep inception network, graph convolutional network and a bidirectional long short-term memory is proposed. Moreover, hyper-parameters of the model are fine-tuned using Bayesian optimization, which is faster and more effective than grid search. The same benchmark test data sets as in the OPUS-TASS paper including TEST2016, TEST2018, CASP12, CASP13, CASPFM, HARD68, CAMEO93, CAMEO93_HARD, as well as the train and validation sets, are used for fair comparison with the literature. Statistically significant improvements are observed in secondary structure prediction on 4 datasets, in phi angle prediction on 2 datasets and in psi angel prediction on 3 datasets compared to the state-of-the-art methods. For solvent accessibility prediction, TEST2016 and TEST2018 datasets are used only to assess the performance of the proposed model.
Collapse
|
10
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
11
|
Pacheco-Sánchez D, Marín P, Molina-Fuentes Á, Marqués S. Subtle sequence differences between two interacting σ 54 -dependent regulators lead to different activation mechanisms. FEBS J 2022; 289:7582-7604. [PMID: 35816183 PMCID: PMC10084136 DOI: 10.1111/febs.16576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 06/08/2022] [Accepted: 07/10/2022] [Indexed: 12/13/2022]
Abstract
In the strictly anaerobic nitrate reducing bacterium Aromatoleum anaerobium, degradation of 1,3-dihydroxybenzene (1,3-DHB, resorcinol) is controlled by two bacterial enhancer-binding proteins (bEBPs), RedR1 and RedR2, which regulate the transcription of three σ54 -dependent promoters controlling expression of the pathway. RedR1 and RedR2 are identical over their length except for their N-terminal tail which differ in sequence and length (six and eight residues, respectively), a single change in their N-terminal domain (NTD), and nine non-identical residues in their C-terminal domain (CTD). Their NTD is composed of a GAF and a PAS domain connected by a linker helix. We show that each regulator is controlled by a different mechanism: whilst RedR1 responds to the classical NTD-mediated negative regulation that is released by the presence of its effector, RedR2 activity is constitutive and controlled through interaction with BtdS, an integral membrane subunit of hydroxyhydroquinone dehydrogenase carrying out the second step in 1,3-DHB degradation. BtdS sequesters the RedR2 regulator to the membrane through its NTD, where a four-Ile track in the PAS domain, interrupted by a Thr in RedR1, and the N-terminal tail are involved. The presence of 1,3-DHB, which is metabolized to hydroxybenzoquinone, releases RedR2 from the membrane. Most bEBPs assemble into homohexamers to activate transcription; we show that hetero-oligomer formation between RedR1 and RedR2 is favoured over homo-oligomers. However, either an NTD-truncated version of RedR1 or a full-length RedR2 are capable of promoter activation on their own, suggesting they should assemble into homohexamers in vivo. We show that promoter DNA behaves as an allosteric effector through binding the CTD to control ΔNTD-RedR1 multimerization and activity. Overall, the regulation of the 1,3-DHB anaerobic degradation pathway can be described as a novel mode of bEBP activation and assembly.
Collapse
Affiliation(s)
- Daniel Pacheco-Sánchez
- Department of Environmental Protection, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Granada, Spain
| | - Patricia Marín
- Department of Environmental Protection, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Granada, Spain
| | - Águeda Molina-Fuentes
- Department of Environmental Protection, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Granada, Spain
| | - Silvia Marqués
- Department of Environmental Protection, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Granada, Spain
| |
Collapse
|
12
|
Pu Y, Li J, Tang J, Guo F. DeepFusionDTA: Drug-Target Binding Affinity Prediction With Information Fusion and Hybrid Deep-Learning Ensemble Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2760-2769. [PMID: 34379594 DOI: 10.1109/tcbb.2021.3103966] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identification of drug-target interaction (DTI) is the most important issue in the broad field of drug discovery. Using purely biological experiments to verify drug-target binding profiles takes lots of time and effort, so computational technologies for this task obviously have great benefits in reducing the drug search space. Most of computational methods to predict DTI are proposed to solve a binary classification problem, which ignore the influence of binding strength. Therefore, drug-target binding affinity prediction is still a challenging issue. Currently, lots of studies only extract sequence information that lacks feature-rich representation, but we consider more spatial features in order to merge various data in drug and target spaces. In this study, we propose a two-stage deep neural network ensemble model for detecting drug-target binding affinity, called DeepFusionDTA, via various information analysis modules. First stage is to utilize sequence and structure information to generate fusion feature map of candidate protein and drug pair through various analysis modules based deep learning. Second stage is to apply bagging-based ensemble learning strategy for regression prediction, and we obtain outstanding results by combining the advantages of various algorithms in efficient feature abstraction and regression calculation. Importantly, we evaluate our novel method, DeepFusionDTA, which delivers 1.5 percent CI increase on KIBA dataset and 1.0 percent increase on Davis dataset, by comparing with existing prediction tools, DeepDTA. Furthermore, the ideas we have offered can be applied to in-silico screening of the interaction space, to provide novel DTIs which can be experimentally pursued. The codes and data are available from https://github.com/guofei-tju/DeepFusionDTA.
Collapse
|
13
|
Byun JK, Vu JA, He SL, Jang JC, Musier-Forsyth K. Plant-exclusive domain of trans-editing enzyme ProXp-ala confers dimerization and enhanced tRNA binding. J Biol Chem 2022; 298:102255. [PMID: 35835222 PMCID: PMC9425024 DOI: 10.1016/j.jbc.2022.102255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 07/06/2022] [Accepted: 07/08/2022] [Indexed: 11/26/2022] Open
Abstract
Faithful translation of the genetic code is critical for the viability of all living organisms. The trans-editing enzyme ProXp-ala prevents Pro to Ala mutations during translation by hydrolyzing misacylated Ala-tRNAPro that has been synthesized by prolyl-tRNA synthetase. Plant ProXp-ala sequences contain a conserved C-terminal domain (CTD) that is absent in other organisms; the origin, structure, and function of this extra domain are unknown. To characterize the plant-specific CTD, we performed bioinformatics and computational analyses that provided a model consistent with a conserved α-helical structure. We also expressed and purified wildtype Arabidopsis thaliana (At) ProXp-ala in Escherichia coli, as well as variants lacking the CTD or containing only the CTD. Circular dichroism spectroscopy confirmed a loss of α-helical signal intensity upon CTD truncation. Size-exclusion chromatography with multiangle laser-light scattering revealed that wildtype At ProXp-ala was primarily dimeric and CTD truncation abolished dimerization in vitro. Furthermore, bimolecular fluorescence complementation assays in At protoplasts support a role for the CTD in homodimerization in vivo. The deacylation rate of Ala-tRNAPro by At ProXp-ala was also significantly reduced in the absence of the CTD, and kinetic assays indicated that the reduction in activity is primarily due to a tRNA binding defect. Overall, these results broaden our understanding of eukaryotic translational fidelity in the plant kingdom. Our study reveals that the plant-specific CTD plays a significant role in substrate binding and canonical editing function. Through its ability to facilitate protein-protein interactions, we propose the CTD may also provide expanded functional potential for trans-editing enzymes in plants.
Collapse
Affiliation(s)
- Jun-Kyu Byun
- Center for RNA Biology, The Ohio State University, Columbus, Ohio, USA; Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA
| | - John A Vu
- Center for RNA Biology, The Ohio State University, Columbus, Ohio, USA; Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA
| | - Siou-Luan He
- Center for RNA Biology, The Ohio State University, Columbus, Ohio, USA; Department of Horticulture and Crop Science and Center for Applied Plant Sciences, The Ohio State University, Columbus, Ohio, USA
| | - Jyan-Chyun Jang
- Center for RNA Biology, The Ohio State University, Columbus, Ohio, USA; Department of Horticulture and Crop Science and Center for Applied Plant Sciences, The Ohio State University, Columbus, Ohio, USA.
| | - Karin Musier-Forsyth
- Center for RNA Biology, The Ohio State University, Columbus, Ohio, USA; Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA.
| |
Collapse
|
14
|
Niemann M, Matern BM, Spierings E. Snowflake: A deep learning-based human leukocyte antigen matching algorithm considering allele-specific surface accessibility. Front Immunol 2022; 13:937587. [PMID: 35967374 PMCID: PMC9372366 DOI: 10.3389/fimmu.2022.937587] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/11/2022] [Indexed: 12/12/2022] Open
Abstract
Histocompatibility in solid-organ transplantation has a strong impact on long-term graft survival. Although recent advances in matching of both B-cell epitopes and T-cell epitopes have improved understanding of allorecognition, the immunogenic determinants are still not fully understood. We hypothesized that HLA solvent accessibility is allele-specific, thus supporting refinement of HLA B-cell epitope prediction. We developed a computational pipeline named Snowflake to calculate solvent accessibility of HLA Class I proteins for deposited HLA crystal structures, supplemented by constructed HLA structures through the AlphaFold protein folding predictor and peptide binding predictions of the APE-Gen docking framework. This dataset trained a four-layer long short-term memory bidirectional recurrent neural network, which in turn inferred solvent accessibility of all known HLA Class I proteins. We extracted 676 HLA Class-I experimental structures from the Protein Data Bank and supplemented it by 37 Class-I alleles for which structures were predicted. For each of the predicted structures, 10 known binding peptides as reported by the Immune Epitope DataBase were rendered into the binding groove. Although HLA Class I proteins predominantly are folded similarly, we found higher variation in root mean square difference of solvent accessibility between experimental structures of different HLAs compared to structures with identical amino acid sequence, suggesting HLA’s solvent accessible surface is protein specific. Hence, residues may be surface-accessible on e.g. HLA-A*02:01, but not on HLA-A*01:01. Mapping these data to antibody-verified epitopes as defined by the HLA Epitope Registry reveals patterns of (1) consistently accessible residues, (2) only subsets of an epitope’s residues being consistently accessible and (3) varying surface accessibility of residues of epitopes. Our data suggest B-cell epitope definitions can be refined by considering allele-specific solvent-accessibility, rather than aggregating HLA protein surface maps by HLA class or locus. To support studies on epitope analyses in organ transplantation, the calculation of donor-allele-specific solvent-accessible amino acid mismatches was implemented as a cloud-based web service.
Collapse
Affiliation(s)
- Matthias Niemann
- Research and Development, PIRCHE AG, Berlin, Germany
- *Correspondence: Matthias Niemann,
| | - Benedict M. Matern
- Center for Translational Immunology, University Medical Center, Utrecht, Netherlands
| | - Eric Spierings
- Center for Translational Immunology, University Medical Center, Utrecht, Netherlands
- Central Diagnostic Laboratory, University Medical Center, Utrecht, Netherlands
| |
Collapse
|
15
|
Jin X, Guo L, Jiang Q, Wu N, Yao S. Prediction of protein secondary structure based on an improved channel attention and multiscale convolution module. Front Bioeng Biotechnol 2022; 10:901018. [PMID: 35935483 PMCID: PMC9355137 DOI: 10.3389/fbioe.2022.901018] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Prediction of the protein secondary structure is a key issue in protein science. Protein secondary structure prediction (PSSP) aims to construct a function that can map the amino acid sequence into the secondary structure so that the protein secondary structure can be obtained according to the amino acid sequence. Driven by deep learning, the prediction accuracy of the protein secondary structure has been greatly improved in recent years. To explore a new technique of PSSP, this study introduces the concept of an adversarial game into the prediction of the secondary structure, and a conditional generative adversarial network (GAN)-based prediction model is proposed. We introduce a new multiscale convolution module and an improved channel attention (ICA) module into the generator to generate the secondary structure, and then a discriminator is designed to conflict with the generator to learn the complicated features of proteins. Then, we propose a PSSP method based on the proposed multiscale convolution module and ICA module. The experimental results indicate that the conditional GAN-based protein secondary structure prediction (CGAN-PSSP) model is workable and worthy of further study because of the strong feature-learning ability of adversarial learning.
Collapse
Affiliation(s)
- Xin Jin
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| | - Lin Guo
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| | - Qian Jiang
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| | - Nan Wu
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| | - Shaowen Yao
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| |
Collapse
|
16
|
Yu CH, Chen W, Chiang YH, Guo K, Martin Moldes Z, Kaplan DL, Buehler MJ. End-to-End Deep Learning Model to Predict and Design Secondary Structure Content of Structural Proteins. ACS Biomater Sci Eng 2022; 8:1156-1165. [PMID: 35129957 PMCID: PMC9347213 DOI: 10.1021/acsbiomaterials.1c01343] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Structural proteins are the basis of many biomaterials and key construction and functional components of all life. Further, it is well-known that the diversity of proteins' function relies on their local structures derived from their primary amino acid sequences. Here, we report a deep learning model to predict the secondary structure content of proteins directly from primary sequences, with high computational efficiency. Understanding the secondary structure content of proteins is crucial to designing proteins with targeted material functions, especially mechanical properties. Using convolutional and recurrent architectures and natural language models, our deep learning model predicts the content of two essential types of secondary structures, the α-helix and the β-sheet. The training data are collected from the Protein Data Bank and contain many existing protein geometries. We find that our model can learn the hidden features as patterns of input sequences that can then be directly related to secondary structure content. The α-helix and β-sheet content predictions show excellent agreement with training data and newly deposited protein structures that were recently identified and that were not included in the original training set. We further demonstrate the features of the model by a search for de novo protein sequences that optimize max/min α-helix/β-sheet content and compare the predictions with folded models of these sequences based on AlphaFold2. Excellent agreement is found, underscoring that our model has predictive potential for rapidly designing proteins with specific secondary structures and could be widely applied to biomedical industries, including protein biomaterial designs and regenerative medicine applications.
Collapse
Affiliation(s)
- Chi-Hua Yu
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Department of Engineering Science, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Wei Chen
- Department of Engineering Science, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Yu-Hsuan Chiang
- Department of Civil Engineering, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Kai Guo
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Zaira Martin Moldes
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - David L Kaplan
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Center for Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
17
|
Yeung MY. Histocompatibility Assessment in Precision Medicine for Transplantation: Towards a Better Match. Semin Nephrol 2022; 42:44-62. [DOI: 10.1016/j.semnephrol.2022.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
18
|
Noddings CM, Wang RYR, Johnson JL, Agard DA. Structure of Hsp90-p23-GR reveals the Hsp90 client-remodelling mechanism. Nature 2022; 601:465-469. [PMID: 34937936 PMCID: PMC8994517 DOI: 10.1038/s41586-021-04236-1] [Citation(s) in RCA: 78] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 11/13/2021] [Indexed: 01/11/2023]
Abstract
Hsp90 is a conserved and essential molecular chaperone responsible for the folding and activation of hundreds of 'client' proteins1-3. The glucocorticoid receptor (GR) is a model client that constantly depends on Hsp90 for activity4-9. GR ligand binding was previously shown to nr inhibited by Hsp70 and restored by Hsp90, aided by the co-chaperone p2310. However, a molecular understanding of the chaperone-mediated remodelling that occurs between the inactive Hsp70-Hsp90 'client-loading complex' and an activated Hsp90-p23 'client-maturation complex' is lacking for any client, including GR. Here we present a cryo-electron microscopy (cryo-EM) structure of the human GR-maturation complex (GR-Hsp90-p23), revealing that the GR ligand-binding domain is restored to a folded, ligand-bound conformation, while being simultaneously threaded through the Hsp90 lumen. In addition, p23 directly stabilizes native GR using a C-terminal helix, resulting in enhanced ligand binding. This structure of a client bound to Hsp90 in a native conformation contrasts sharply with the unfolded kinase-Hsp90 structure11. Thus, aided by direct co-chaperone-client interactions, Hsp90 can directly dictate client-specific folding outcomes. Together with the GR-loading complex structure12, we present the molecular mechanism of chaperone-mediated GR remodelling, establishing the first, to our knowledge, complete chaperone cycle for any Hsp90 client.
Collapse
Affiliation(s)
- Chari M. Noddings
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Ray Yu-Ruei Wang
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Jill L. Johnson
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844, USA
| | - David A. Agard
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA,Correspondence to David A. Agard ()
| |
Collapse
|
19
|
Nasi GI, Aktypi FD, Spatharas PM, Louros NN, Tsiolaki PL, Magafa V, Trougakos IP, Iconomidou VA. Arabidopsis thaliana Plant Natriuretic Peptide Active Domain Forms Amyloid-like Fibrils in a pH-Dependent Manner. PLANTS (BASEL, SWITZERLAND) 2021; 11:9. [PMID: 35009013 PMCID: PMC8747288 DOI: 10.3390/plants11010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/13/2021] [Accepted: 12/15/2021] [Indexed: 11/17/2022]
Abstract
Plant natriuretic peptides (PNPs) are hormones that have been extracted from many different species, with the Arabidopsis thaliana PNP (AtPNP-A) being the most studied among them. AtPNP-A is a signaling molecule that consists of 130 residues and is secreted into the apoplast, under conditions of biotic or abiotic stress. AtPNP-A has distant sequence homology with human ANP, a protein that forms amyloid fibrils in vivo. In this work, we investigated the amyloidogenic properties of a 34-residue-long peptide, located within the AtPNP-A sequence, in three different pH conditions, using transmission electron microscopy, X-ray fiber diffraction, ATR FT-IR spectroscopy, Congo red and Thioflavin T staining assays. We also utilize bioinformatics tools to study its association with known plant amyloidogenic proteins and other A. thaliana proteins. Our results reveal a new case of a pH-dependent amyloid forming peptide in A. thaliana, with a potential functional role.
Collapse
Affiliation(s)
- Georgia I. Nasi
- Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 157 01 Athens, Greece; (G.I.N.); (F.D.A.); (P.M.S.); (N.N.L.); (P.L.T.); (I.P.T.)
| | - Foteini D. Aktypi
- Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 157 01 Athens, Greece; (G.I.N.); (F.D.A.); (P.M.S.); (N.N.L.); (P.L.T.); (I.P.T.)
| | - Panagiotis M. Spatharas
- Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 157 01 Athens, Greece; (G.I.N.); (F.D.A.); (P.M.S.); (N.N.L.); (P.L.T.); (I.P.T.)
| | - Nikolaos N. Louros
- Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 157 01 Athens, Greece; (G.I.N.); (F.D.A.); (P.M.S.); (N.N.L.); (P.L.T.); (I.P.T.)
| | - Paraskevi L. Tsiolaki
- Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 157 01 Athens, Greece; (G.I.N.); (F.D.A.); (P.M.S.); (N.N.L.); (P.L.T.); (I.P.T.)
| | - Vassiliki Magafa
- Department of Pharmacy, University of Patras, 265 04 Patras, Greece;
| | - Ioannis P. Trougakos
- Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 157 01 Athens, Greece; (G.I.N.); (F.D.A.); (P.M.S.); (N.N.L.); (P.L.T.); (I.P.T.)
| | - Vassiliki A. Iconomidou
- Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 157 01 Athens, Greece; (G.I.N.); (F.D.A.); (P.M.S.); (N.N.L.); (P.L.T.); (I.P.T.)
| |
Collapse
|
20
|
Martiny HM, Armenteros JJA, Johansen AR, Salomon J, Nielsen H. Deep protein representations enable recombinant protein expression prediction. Comput Biol Chem 2021; 95:107596. [PMID: 34775287 DOI: 10.1016/j.compbiolchem.2021.107596] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 10/21/2021] [Accepted: 10/21/2021] [Indexed: 11/19/2022]
Abstract
A crucial process in the production of industrial enzymes is recombinant gene expression, which aims to induce enzyme overexpression of the genes in a host microbe. Current approaches for securing overexpression rely on molecular tools such as adjusting the recombinant expression vector, adjusting cultivation conditions, or performing codon optimizations. However, such strategies are time-consuming, and an alternative strategy would be to select genes for better compatibility with the recombinant host. Several methods for predicting soluble expression are available; however, they are all optimized for the expression host Escherichia coli and do not consider the possibility of an expressed protein not being soluble. We show that these tools are not suited for predicting expression potential in the industrially important host Bacillus subtilis. Instead, we build a B. subtilis-specific machine learning model for expressibility prediction. Given millions of unlabelled proteins and a small labeled dataset, we can successfully train such a predictive model. The unlabeled proteins provide a performance boost relative to using amino acid frequencies of the labeled proteins as input. On average, we obtain a modest performance of 0.64 area-under-the-curve (AUC) and 0.2 Matthews correlation coefficient (MCC). However, we find that this is sufficient for the prioritization of expression candidates for high-throughput studies. Moreover, the predicted class probabilities are correlated with expression levels. A number of features related to protein expression, including base frequencies and solubility, are captured by the model.
Collapse
Affiliation(s)
- Hannah-Marie Martiny
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.
| | - Jose Juan Almagro Armenteros
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | | | - Jesper Salomon
- Enzyme Research, Novozymes A/S, Krogshøjvej 36, 2880 Bagsværd, Denmark
| | - Henrik Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
21
|
Moffat L, Jones DT. Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework. Bioinformatics 2021; 37:3744-3751. [PMID: 34213528 PMCID: PMC8570780 DOI: 10.1093/bioinformatics/btab491] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/08/2021] [Accepted: 06/30/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q3 score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY AND IMPLEMENTATION The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lewis Moffat
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - David T Jones
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| |
Collapse
|
22
|
Ho CT, Huang YW, Chen TR, Lo CH, Lo WC. Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules 2021; 11:1627. [PMID: 34827624 PMCID: PMC8615938 DOI: 10.3390/biom11111627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 12/29/2022] Open
Abstract
Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81-86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4-5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84-87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.
Collapse
Affiliation(s)
- Chia-Tzu Ho
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Yu-Wei Huang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Chia-Hua Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
23
|
Abstract
We explore the structural similarities in three different languages, first in the protein language whose primary letters are the amino acids, second in the musical language whose primary letters are the notes, and third in the poetry language whose primary letters are the alphabet. For proteins, the non local (secondary) letters are the types of foldings in space (α-helices, β-sheets, etc.); for music, one is dealing with clear-cut repetition units called musical forms and for poems the structure consists of grammatical forms (names, verbs, etc.). We show in this paper that the mathematics of such secondary structures relies on finitely presented groups fp on r letters, where r counts the number of types of such secondary non local segments. The number of conjugacy classes of a given index (also the number of graph coverings over a base graph) of a group fp is found to be close to the number of conjugacy classes of the same index in the free group Fr−1 on r−1 generators. In a concrete way, we explore the group structure of a variant of the SARS-Cov-2 spike protein and the group structure of apolipoprotein-H, passing from the primary code with amino acids to the secondary structure organizing the foldings. Then, we look at the musical forms employed in the classical and contemporary periods. Finally, we investigate in much detail the group structure of a small poem in prose by Charles Baudelaire and that of the Bateau Ivre by Arthur Rimbaud.
Collapse
|
24
|
Improved protein relative solvent accessibility prediction using deep multi-view feature learning framework. Anal Biochem 2021; 631:114358. [PMID: 34478704 DOI: 10.1016/j.ab.2021.114358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/22/2021] [Accepted: 08/25/2021] [Indexed: 11/20/2022]
Abstract
The accurate prediction of the relative solvent accessibility of a protein is critical to understanding its 3D structure and biological function. In this study, a novel deep multi-view feature learning (DMVFL) framework that integrates three different neural network units, i.e., bidirectional long short-term memory recurrent neural network, squeeze-and-excitation, and fully-connected hidden layer, with four sequence-based single-view features, i.e., position-specific scoring matrix, position-specific frequency matrix, predicted secondary structure, and roughly predicted three-state relative solvent accessibility probability, is developed to accurately predict relative solvent accessibility information of protein. On the basis of this newly developed framework, one new protein relative solvent accessibility predictor was proposed and called DMVFL-RSA, which employs a customized multiple feedback mechanism that helps to extract discriminative information embedded in the four single-view features. In benchmark tests on TEST524 and CASP14-derived (CASP14set) datasets, DMVFL-RSA outperforms other existing state-of-the-art protein relative solvent accessibility predictors when predicting two-state (exposure threshold of 25%), three-state (exposure thresholds of 9% and 36%), and four-state (exposure thresholds of 4%, 25%, and 50%) discrete values. For real-valued prediction on TEST524 and CASP14set, DMVFL-RSA has also gained high Pearson correlation coefficient values, indicating a positive correlation between the predicted and native relative solvent accessibility. Detailed analyses show that the major advantages of DMVFL-RSA lie in the high efficiency of the DMVFL framework, the applied multiple feedback mechanism, and the strong sensitivity of the sequence-based features. The web server of DMVFL-RSA is freely available at https://jun-csbio.github.io/DMVFL-RSA/for academic use. The standalone package of DMVFL-RSA is downloadable at https://github.com/XueQiangFan/DMVFL-RSA.
Collapse
|
25
|
Chen TR, Juan SH, Huang YW, Lin YC, Lo WC. A secondary structure-based position-specific scoring matrix applied to the improvement in protein secondary structure prediction. PLoS One 2021; 16:e0255076. [PMID: 34320027 PMCID: PMC8318245 DOI: 10.1371/journal.pone.0255076] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 07/11/2021] [Indexed: 11/18/2022] Open
Abstract
Protein secondary structure prediction (SSP) has a variety of applications; however, there has been relatively limited improvement in accuracy for years. With a vision of moving forward all related fields, we aimed to make a fundamental advance in SSP. There have been many admirable efforts made to improve the machine learning algorithm for SSP. This work thus took a step back by manipulating the input features. A secondary structure element-based position-specific scoring matrix (SSE-PSSM) is proposed, based on which a new set of machine learning features can be established. The feasibility of this new PSSM was evaluated by rigid independent tests with training and testing datasets sharing <25% sequence identities. In all experiments, the proposed PSSM outperformed the traditional amino acid PSSM. This new PSSM can be easily combined with the amino acid PSSM, and the improvement in accuracy was remarkable. Preliminary tests made by combining the SSE-PSSM and well-known SSP methods showed 2.0% and 5.2% average improvements in three- and eight-state SSP accuracies, respectively. If this PSSM can be integrated into state-of-the-art SSP methods, the overall accuracy of SSP may break the current restriction and eventually bring benefit to all research and applications where secondary structure prediction plays a vital role during development. To facilitate the application and integration of the SSE-PSSM with modern SSP methods, we have established a web server and standalone programs for generating SSE-PSSM available at http://10.life.nctu.edu.tw/SSE-PSSM.
Collapse
Affiliation(s)
- Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Sheng-Hung Juan
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Yu-Wei Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Yen-Cheng Lin
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- * E-mail:
| |
Collapse
|
26
|
Abstract
Every protein consists of a linear sequence over an alphabet of 20 letters/amino acids. The sequence unfolds in the 3-dimensional space through secondary (local foldings), tertiary (bonds) and quaternary (disjoint multiple) structures. The mere existence of the genetic code for the 20 letters of the linear chain could be predicted with the (informationally complete) irreducible characters of the finite group Gn:=Zn⋊2O (with n=5 or 7 and 2O the binary octahedral group) in our previous two papers. It turns out that some quaternary structures of protein complexes display n-fold symmetries. We propose an approach of secondary structures based on free group theory. Our results are compared to other approaches of predicting secondary structures of proteins in terms of α helices, β sheets and coils, or more refined techniques. It is shown that the secondary structure of proteins shows similarities to the structure of some hyperbolic 3-manifolds. The hyperbolic 3-manifold of smallest volume—Gieseking manifold—some other 3 manifolds and the oriented hypercartographic group are singled out as tentative models of such secondary structures. For the quaternary structure, there are links to the Kummer surface.
Collapse
|
27
|
Görmez Y, Sabzekar M, Aydın Z. IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction. Proteins 2021; 89:1277-1288. [PMID: 33993559 DOI: 10.1002/prot.26149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 04/21/2021] [Accepted: 05/11/2021] [Indexed: 11/10/2022]
Abstract
There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.
Collapse
Affiliation(s)
- Yasin Görmez
- Faculty of Economics and Administrative Sciences, Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey
| | - Mostafa Sabzekar
- Department of Computer Engineering, Birjand University of Technology, Birjand, Iran
| | - Zafer Aydın
- Engineering Faculty, Computer Engineering Department, Abdullah Gül University, Kayseri, Turkey
| |
Collapse
|
28
|
Billey E, Magneschi L, Leterme S, Bedhomme M, Andres-Robin A, Poulet L, Michaud M, Finazzi G, Dumas R, Crouzy S, Laueffer F, Fourage L, Rébeillé F, Amato A, Collin S, Jouhet J, Maréchal E. Characterization of the Bubblegum acyl-CoA synthetase of Microchloropsis gaditana. PLANT PHYSIOLOGY 2021; 185:815-835. [PMID: 33793914 PMCID: PMC8133546 DOI: 10.1093/plphys/kiaa110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 12/15/2020] [Indexed: 05/15/2023]
Abstract
The metabolic pathways of glycerolipids are well described in cells containing chloroplasts limited by a two-membrane envelope but not in cells containing plastids limited by four membranes, including heterokonts. Fatty acids (FAs) produced in the plastid, palmitic and palmitoleic acids (16:0 and 16:1), are used in the cytosol for the synthesis of glycerolipids via various routes, requiring multiple acyl-Coenzyme A (CoA) synthetases (ACS). Here, we characterized an ACS of the Bubblegum subfamily in the photosynthetic eukaryote Microchloropsis gaditana, an oleaginous heterokont used for the production of lipids for multiple applications. Genome engineering with TALE-N allowed the generation of MgACSBG point mutations, but no knockout was obtained. Point mutations triggered an overall decrease of 16:1 in lipids, a specific increase of unsaturated 18-carbon acyls in phosphatidylcholine and decrease of 20-carbon acyls in the betaine lipid diacylglyceryl-trimethyl-homoserine. The profile of acyl-CoAs highlighted a decrease in 16:1-CoA and 18:3-CoA. Structural modeling supported that mutations affect accessibility of FA to the MgACSBG reaction site. Expression in yeast defective in acyl-CoA biosynthesis further confirmed that point mutations affect ACSBG activity. Altogether, this study supports a critical role of heterokont MgACSBG in the production of 16:1-CoA and 18:3-CoA. In M. gaditana mutants, the excess saturated and monounsaturated FAs were diverted to triacylglycerol, thus suggesting strategies to improve the oil content in this microalga.
Collapse
Affiliation(s)
- Elodie Billey
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
- Total Raffinage-Chimie, Tour Coupole, 2 Place Jean Millier, 92078 Paris La Défense, France
| | - Leonardo Magneschi
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Sébastien Leterme
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Mariette Bedhomme
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
- Total Raffinage-Chimie, Tour Coupole, 2 Place Jean Millier, 92078 Paris La Défense, France
| | - Amélie Andres-Robin
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Laurent Poulet
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Morgane Michaud
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Giovanni Finazzi
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Renaud Dumas
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Serge Crouzy
- Laboratoire de Chimie et Biologie des Métaux, Unité mixte de Recherche 5249 CNRS–CEA–Univ. Grenoble Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Frédéric Laueffer
- Total Raffinage-Chimie, Tour Coupole, 2 Place Jean Millier, 92078 Paris La Défense, France
| | - Laurent Fourage
- Total Raffinage-Chimie, Tour Coupole, 2 Place Jean Millier, 92078 Paris La Défense, France
| | - Fabrice Rébeillé
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Alberto Amato
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Séverine Collin
- Total Raffinage-Chimie, Tour Coupole, 2 Place Jean Millier, 92078 Paris La Défense, France
| | - Juliette Jouhet
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Eric Maréchal
- Laboratoire de Physiologie Cellulaire et Végétale, Unité mixte de Recherche 5168 CNRS–CEA–INRA–Univ. Grenoble-Alpes, IRIG, CEA Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| |
Collapse
|
29
|
Krieger S, Kececioglu J. Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization. Bioinformatics 2021; 36:i317-i325. [PMID: 32657384 PMCID: PMC7355242 DOI: 10.1093/bioinformatics/btaa336] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2-10%, and Q3 accuracy by more than 1-3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.
Collapse
Affiliation(s)
- Spencer Krieger
- Department of Computer Science, The University of Arizona, Tucson, AZ 85721, USA
| | - John Kececioglu
- Department of Computer Science, The University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
30
|
Junqueira Alves C, Silva Ladeira J, Hannah T, Pedroso Dias RJ, Zabala Capriles PV, Yotoko K, Zou H, Friedel RH. Evolution and Diversity of Semaphorins and Plexins in Choanoflagellates. Genome Biol Evol 2021; 13:6149127. [PMID: 33624753 PMCID: PMC8011033 DOI: 10.1093/gbe/evab035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2021] [Indexed: 12/22/2022] Open
Abstract
Semaphorins and plexins are cell surface ligand/receptor proteins that affect cytoskeletal dynamics in metazoan cells. Interestingly, they are also present in Choanoflagellata, a class of unicellular heterotrophic flagellates that forms the phylogenetic sister group to Metazoa. Several members of choanoflagellates are capable of forming transient colonies, whereas others reside solitary inside exoskeletons; their molecular diversity is only beginning to emerge. Here, we surveyed genomics data from 22 choanoflagellate species and detected semaphorin/plexin pairs in 16 species. Choanoflagellate semaphorins (Sema-FN1) contain several domain features distinct from metazoan semaphorins, including an N-terminal Reeler domain that may facilitate dimer stabilization, an array of fibronectin type III domains, a variable serine/threonine-rich domain that is a potential site for O-linked glycosylation, and a SEA domain that can undergo autoproteolysis. In contrast, choanoflagellate plexins (Plexin-1) harbor a domain arrangement that is largely identical to metazoan plexins. Both Sema-FN1 and Plexin-1 also contain a short homologous motif near the C-terminus, likely associated with a shared function. Three-dimensional molecular models revealed a highly conserved structural architecture of choanoflagellate Plexin-1 as compared to metazoan plexins, including similar predicted conformational changes in a segment that is involved in the activation of the intracellular Ras-GAP domain. The absence of semaphorins and plexins in several choanoflagellate species did not appear to correlate with unicellular versus colonial lifestyle or ecological factors such as fresh versus salt water environment. Together, our findings support a conserved mechanism of semaphorin/plexin proteins in regulating cytoskeletal dynamics in unicellular and multicellular organisms.
Collapse
Affiliation(s)
- Chrystian Junqueira Alves
- Friedman Brain Institute, Nash Family Department of Neuroscience and Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Júlia Silva Ladeira
- Programa de Pós-graduação em Modelagem Computacional, Universidade Federal de Juiz de Fora, Minas Gerais, Brazil
| | - Theodore Hannah
- Friedman Brain Institute, Nash Family Department of Neuroscience and Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Roberto J Pedroso Dias
- Departamento de Zoologia, Instituto de Ciências Biológicas, Universidade Federal de Juiz de Fora, Minas Gerais, Brazil
| | - Priscila V Zabala Capriles
- Programa de Pós-graduação em Modelagem Computacional, Universidade Federal de Juiz de Fora, Minas Gerais, Brazil
| | - Karla Yotoko
- Departamento de Biologia Geral, Universidade Federal de Viçosa, Minas Gerais, Brazil
| | - Hongyan Zou
- Friedman Brain Institute, Nash Family Department of Neuroscience and Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Roland H Friedel
- Friedman Brain Institute, Nash Family Department of Neuroscience and Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
31
|
Uddin MR, Mahbub S, Rahman MS, Bayzid MS. SAINT: self-attention augmented inception-inside-inception network improves protein secondary structure prediction. Bioinformatics 2021; 36:4599-4608. [PMID: 32437517 DOI: 10.1093/bioinformatics/btaa531] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 05/10/2020] [Accepted: 05/16/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein structures provide basic insight into how they can interact with other proteins, their functions and biological roles in an organism. Experimental methods (e.g. X-ray crystallography and nuclear magnetic resonance spectroscopy) for predicting the secondary structure (SS) of proteins are very expensive and time consuming. Therefore, developing efficient computational approaches for predicting the SS of protein is of utmost importance. Advances in developing highly accurate SS prediction methods have mostly been focused on 3-class (Q3) structure prediction. However, 8-class (Q8) resolution of SS contains more useful information and is much more challenging than the Q3 prediction. RESULTS We present SAINT, a highly accurate method for Q8 structure prediction, which incorporates self-attention mechanism (a concept from natural language processing) with the Deep Inception-Inside-Inception network in order to effectively capture both the short- and long-range interactions among the amino acid residues. SAINT offers a more interpretable framework than the typical black-box deep neural network methods. Through an extensive evaluation study, we report the performance of SAINT in comparison with the existing best methods on a collection of benchmark datasets, namely, TEST2016, TEST2018, CASP12 and CASP13. Our results suggest that self-attention mechanism improves the prediction accuracy and outperforms the existing best alternate methods. SAINT is the first of its kind and offers the best known Q8 accuracy. Thus, we believe SAINT represents a major step toward the accurate and reliable prediction of SSs of proteins. AVAILABILITY AND IMPLEMENTATION SAINT is freely available as an open-source project at https://github.com/SAINTProtein/SAINT.
Collapse
Affiliation(s)
- Mostofa Rafid Uddin
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh.,Department of Computer Science and Engineering, East West University, Dhaka 1212, Bangladesh
| | - Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - M Saifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| |
Collapse
|
32
|
Guo L, Jiang Q, Jin X, Liu L, Zhou W, Yao S, Wu M, Wang Y. A Deep Convolutional Neural Network to Improve the Prediction of Protein Secondary Structure. Curr Bioinform 2020. [DOI: 10.2174/1574893615666200120103050] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Protein secondary structure prediction (PSSP) is a fundamental task in
bioinformatics that is helpful for understanding the three-dimensional structure and biological
function of proteins. Many neural network-based prediction methods have been developed for
protein secondary structures. Deep learning and multiple features are two obvious means to improve
prediction accuracy.
Objective:
To promote the development of PSSP, a deep convolutional neural network-based
method is proposed to predict both the eight-state and three-state of protein secondary structure.
Methods:
In this model, sequence and evolutionary information of proteins are combined as multiple
input features after preprocessing. A deep convolutional neural network with no pooling layer and
connection layer is then constructed to predict the secondary structure of proteins. L2 regularization,
batch normalization, and dropout techniques are employed to avoid over-fitting and obtain better
prediction performance, and an improved cross-entropy is used as the loss function.
Results:
Our proposed model can obtain Q3 prediction results of 86.2%, 84.5%, 87.8%, and 84.7%,
respectively, on CullPDB, CB513, CASP10 and CASP11 datasets, with corresponding Q8
prediction results of 74.1%, 70.5%, 74.9%, and 71.3%.
Conclusion:
We have proposed the DCNN-SS deep convolutional-network-based PSSP method,
and experimental results show that DCNN-SS performs competitively with other methods.
Collapse
Affiliation(s)
- Lin Guo
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Qian Jiang
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Xin Jin
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Shaowen Yao
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Min Wu
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| | - Yun Wang
- School of Software, Yunnan University, Kunming, China; 2School of Information, Yunnan Normal University, Kunming, China
| |
Collapse
|
33
|
Abstract
For two decades, Rosetta has consistently been at the forefront of protein structure
prediction. While it has become a very large package comprising programs, scripts, and tools, for
different types of macromolecular modelling such as ligand docking, protein-protein docking,
protein design, and loop modelling, it started as the implementation of an algorithm for ab initio
protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the
literature to describe that algorithm and its contribution to the third edition of the community wide
Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta
stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers
have been contributing to deciphering ’the second half of the genetic code’. Although the focus of
Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is
associated with its fragment-assembly protein structure prediction approach. Following a
presentation of the main concepts underpinning its foundation, especially sequence-structure
correlation and usage of fragments, we review the main stages of its developments and highlight
the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
Collapse
Affiliation(s)
- Jad Abbass
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, United Kingdom
| |
Collapse
|
34
|
Carabias A, Gómez-Hernández M, de Cima S, Rodríguez-Blázquez A, Morán-Vaquero A, González-Sáenz P, Guerrero C, de Pereda JM. Mechanisms of autoregulation of C3G, activator of the GTPase Rap1, and its catalytic deregulation in lymphomas. Sci Signal 2020; 13:13/647/eabb7075. [PMID: 32873726 DOI: 10.1126/scisignal.abb7075] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
C3G is a guanine nucleotide exchange factor (GEF) that regulates cell adhesion and migration by activating the GTPase Rap1. The GEF activity of C3G is stimulated by the adaptor proteins Crk and CrkL and by tyrosine phosphorylation. Here, we uncovered mechanisms of C3G autoinhibition and activation. Specifically, we found that two intramolecular interactions regulate the activity of C3G. First, an autoinhibitory region (AIR) within the central domain of C3G binds to and blocks the catalytic Cdc25H domain. Second, the binding of the protein's N-terminal domain to its Ras exchanger motif (REM) is required for its GEF activity. CrkL activated C3G by displacing the AIR/Cdc25HD interaction. Two missense mutations in the AIR found in non-Hodgkin's lymphomas, Y554H and M555K, disrupted the autoinhibitory mechanism. Expression of C3G-Y554H or C3G-M555K in Ba/F3 pro-B cells caused constitutive activation of Rap1 and, consequently, the integrin LFA-1. Our findings suggest that sustained Rap1 activation by deregulated C3G might promote progression of lymphomas and that designing therapeutics to target C3G might treat these malignancies.
Collapse
Affiliation(s)
- Arturo Carabias
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, 37007 Salamanca, Spain
| | - María Gómez-Hernández
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, 37007 Salamanca, Spain
| | - Sergio de Cima
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, 37007 Salamanca, Spain
| | - Antonio Rodríguez-Blázquez
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, 37007 Salamanca, Spain
| | - Alba Morán-Vaquero
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, 37007 Salamanca, Spain
| | - Patricia González-Sáenz
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, 37007 Salamanca, Spain
| | - Carmen Guerrero
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, 37007 Salamanca, Spain.,Departamento de Medicina, Facultad de Medicina, Universidad de Salamanca, Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - José M de Pereda
- Centro de Investigación del Cáncer and Instituto de Biología Molecular y Celular del Cáncer, Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, 37007 Salamanca, Spain.
| |
Collapse
|
35
|
Feng P, Feng L. Recent Advances on Antioxidant Identification Based on Machine Learning Methods. Curr Drug Metab 2020; 21:804-809. [PMID: 32682368 DOI: 10.2174/1389200221666200719001449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 03/17/2020] [Accepted: 05/13/2020] [Indexed: 11/22/2022]
Abstract
Antioxidants are molecules that can prevent damages to cells caused by free radicals. Recent studies also demonstrated that antioxidants play roles in preventing diseases. However, the number of known molecules with antioxidant activity is very small. Therefore, it is necessary to identify antioxidants from various resources. In the past several years, a series of computational methods have been proposed to identify antioxidants. In this review, we briefly summarized recent advances in computationally identifying antioxidants. The challenges and future perspectives for identifying antioxidants were also discussed. We hope this review will provide insights into researches on antioxidant identification.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Lijing Feng
- School of Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
36
|
Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning. J Comput Biol 2020; 27:796-814. [DOI: 10.1089/cmb.2019.0193] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
37
|
Kramer CSM, Koster J, Haasnoot GW, Roelen DL, Claas FHJ, Heidt S. HLA-EMMA: A user-friendly tool to analyse HLA class I and class II compatibility on the amino acid level. HLA 2020; 96:43-51. [PMID: 32227681 PMCID: PMC7317360 DOI: 10.1111/tan.13883] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/23/2020] [Accepted: 03/25/2020] [Indexed: 11/30/2022]
Abstract
In renal transplantation, polymorphic amino acids on mismatched donor HLA molecules can lead to the induction of de novo donor‐specific antibodies (DSA), which are associated with inferior graft survival. To ultimately prevent de novo DSA formation without unnecessarily precluding transplants it is essential to define which polymorphic amino acid mismatches can actually induce an antibody response. To facilitate this, we developed a user‐friendly software program that establishes HLA class I and class II compatibility between donor and recipient on the amino acid level. HLA epitope mismatch algorithm (HLA‐EMMA) is a software program that compares simultaneously the HLA class I and class II amino acid sequences of the donor with the HLA amino acid sequences of the recipient and determines the polymorphic solvent accessible amino acid mismatches that are likely to be accessible to B cell receptors. Analysis can be performed for a large number of donor‐recipient pairs at once. As proof of principle, a previously described study cohort of 191 lymphocyte immunotherapy recipients was analysed with HLA‐EMMA and showed a higher frequency of DSA formation with higher number of solvent accessible amino acids mismatches. Overall, HLA‐EMMA can be used to analyse compatibility on amino acid level between donor and recipient HLA class I and class II simultaneously for large cohorts to ultimately determine the most immunogenic amino acid mismatches.
Collapse
Affiliation(s)
- Cynthia S M Kramer
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Johan Koster
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Geert W Haasnoot
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Dave L Roelen
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Frans H J Claas
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Sebastiaan Heidt
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
38
|
Smolarczyk T, Roterman-Konieczna I, Stapor K. Protein Secondary Structure Prediction: A Review of Progress and Directions. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017104639] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Over the last few decades, a search for the theory of protein folding has
grown into a full-fledged research field at the intersection of biology, chemistry and informatics.
Despite enormous effort, there are still open questions and challenges, like understanding the rules
by which amino acid sequence determines protein secondary structure.
Objective:
In this review, we depict the progress of the prediction methods over the years and
identify sources of improvement.
Methods:
The protein secondary structure prediction problem is described followed by the discussion
on theoretical limitations, description of the commonly used data sets, features and a review
of three generations of methods with the focus on the most recent advances. Additionally, methods
with available online servers are assessed on the independent data set.
Results:
The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and
76.5% for an 8-class prediction.
Conclusion:
This review summarizes recent advances and outlines further research directions.
Collapse
Affiliation(s)
- Tomasz Smolarczyk
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Irena Roterman-Konieczna
- Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Krakow, Poland
| | - Katarzyna Stapor
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
39
|
Ellingson SR, Davis B, Allen J. Machine learning and ligand binding predictions: A review of data, methods, and obstacles. Biochim Biophys Acta Gen Subj 2020; 1864:129545. [PMID: 32057823 DOI: 10.1016/j.bbagen.2020.129545] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 12/21/2019] [Accepted: 01/30/2020] [Indexed: 10/25/2022]
Abstract
Computational predictions of ligand binding is a difficult problem, with more accurate methods being extremely computationally expensive. The use of machine learning for drug binding predictions could possibly leverage the use of biomedical big data in exchange for time-intensive simulations. This paper reviews current trends in the use of machine learning for drug binding predictions, data sources to develop machine learning algorithms, and potential problems that may lead to overfitting and ungeneralizable models. A few popular datasets that can be used to develop virtual high-throughput screening models are characterized using spatial statistics to quantify potential biases. We can see from evaluating some common benchmarks that good performance correlates with models with high-predicted bias scores and models with low bias scores do not have much predictive power. A better understanding of the limits of available data sources and how to fix them will lead to more generalizable models that will lead to novel drug discovery.
Collapse
Affiliation(s)
- Sally R Ellingson
- College of Medicine, Division of Biomedical Informatics, University of Kentucky, Lexington, KY, United States of America; Markey Cancer Center, Lexington, KY, United States of America.
| | - Brian Davis
- Markey Cancer Center, Lexington, KY, United States of America
| | - Jonathan Allen
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| |
Collapse
|
40
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
41
|
O’Brien KT, Mooney C, Lopez C, Pollastri G, Shields DC. Prediction of polyproline II secondary structure propensity in proteins. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191239. [PMID: 32218953 PMCID: PMC7029904 DOI: 10.1098/rsos.191239] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 12/04/2019] [Indexed: 05/29/2023]
Abstract
Background: The polyproline II helix (PPIIH) is an extended protein left-handed secondary structure that usually but not necessarily involves prolines. Short PPIIHs are frequently, but not exclusively, found in disordered protein regions, where they may interact with peptide-binding domains. However, no readily usable software is available to predict this state. Results: We developed PPIIPRED to predict polyproline II helix secondary structure from protein sequences, using bidirectional recurrent neural networks trained on known three-dimensional structures with dihedral angle filtering. The performance of the method was evaluated in an external validation set. In addition to proline, PPIIPRED favours amino acids whose side chains extend from the backbone (Leu, Met, Lys, Arg, Glu, Gln), as well as Ala and Val. Utility for individual residue predictions is restricted by the rarity of the PPIIH feature compared to structurally common features. Conclusion: The software, available at http://bioware.ucd.ie/PPIIPRED, is useful in large-scale studies, such as evolutionary analyses of PPIIH, or computationally reducing large datasets of candidate binding peptides for further experimental validation.
Collapse
Affiliation(s)
- Kevin T. O’Brien
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Catherine Mooney
- School of Computer Science, University College Dublin, Dublin, Ireland
| | - Cyril Lopez
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Dublin, Ireland
- Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Denis C. Shields
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| |
Collapse
|
42
|
Abstract
Modeling the tertiary structure of protein-protein interaction complex has been well studied over many years, especially in the case where the structures of both binding partners are roughly the same before and after binding. However, the assembly of complexes with less-ordered partners is a much harder problem, and modeling even small amounts of flexibility can pose a challenge. In an extreme case, where one of the binding partners is intrinsically disordered before binding, we have previously shown that by initially disregarding the coupling between windows of these intrinsically disordered proteins (IDPs), we can reliably assemble complexes involving IDPs up to at least 69 residues long. Here, we detail the use of the IDP-LZerD package and protocol.
Collapse
|
43
|
Tobias-Santos V, Guerra-Almeida D, Mury F, Ribeiro L, Berni M, Araujo H, Logullo C, Feitosa NM, de Souza-Menezes J, Pessoa Costa E, Nunes-da-Fonseca R. Multiple Roles of the Polycistronic Gene Tarsal-less/Mille-Pattes/Polished-Rice During Embryogenesis of the Kissing Bug Rhodnius prolixus. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00379] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
44
|
Cannon JF. Novel phosphorylation-dependent regulation in an unstructured protein. Proteins 2019; 88:366-384. [PMID: 31512287 DOI: 10.1002/prot.25812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 07/15/2019] [Accepted: 09/04/2019] [Indexed: 12/15/2022]
Abstract
This work explores how phosphorylation of an unstructured protein region in inhibitor-2 (I2) regulates protein phosphatase-1 (PP1) enzyme activity using molecular dynamics (MD). Free I2 is largely unstructured; however, when bound to PP1, three segments adopt a stable structure. In particular, an I2 helix (i-helix) blocks the PP1 active site and inhibits phosphatase activity. I2 phosphorylation in the PP1-I2 complex activates phosphatase activity without I2 dissociation. The I2 Thr74 regulatory phosphorylation site is in an unstructured domain in PP1-I2. PP1-I2 MD demonstrated that I2 phosphorylation promotes early steps of PP1-I2 activation in explicit solvent models. Moreover, phosphorylation-dependent activation occurred in PP1-I2 complexes derived from I2 orthologs with diverse sequences from human, yeast, worm, and protozoa. This system allowed exploration of features of the 73-residue unstructured human I2 domain critical for phosphorylation-dependent activation. These studies revealed that components of I2 unstructured domain are strategically positioned for phosphorylation responsiveness including a transient α-helix. There was no evidence that electrostatic interactions of I2 phosphothreonine74 influenced PP1-I2 activation. Instead, phosphorylation altered the conformation of residues around Thr74. Phosphorylation uncurled the distance between I2 residues Glu71 to Tyr76 to promote PP1-I2 activation, whereas reduced distances reduced activation. This I2 residue Glu71 to Tyr76 distance distribution, independently from Thr74 phosphorylation, controls I2 i-helix displacement from the PP1 active site leading to PP1-I2 activation.
Collapse
Affiliation(s)
- John F Cannon
- Department of Molecular Microbiology and Immunology, University of Missouri, Columbia, Missouri
| |
Collapse
|
45
|
Torrisi M, Kaleel M, Pollastri G. Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction. Sci Rep 2019; 9:12374. [PMID: 31451723 PMCID: PMC6710256 DOI: 10.1038/s41598-019-48786-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 08/12/2019] [Indexed: 01/10/2023] Open
Abstract
Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88–90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Manaz Kaleel
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
46
|
PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning. Amino Acids 2019; 51:1289-1296. [PMID: 31388850 DOI: 10.1007/s00726-019-02767-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/29/2019] [Indexed: 10/26/2022]
Abstract
Predicting the three-dimensional structure of proteins is a long-standing challenge of computational biology, as the structure (or lack of a rigid structure) is well known to determine a protein's function. Predicting relative solvent accessibility (RSA) of amino acids within a protein is a significant step towards resolving the protein structure prediction challenge especially in cases in which structural information about a protein is not available by homology transfer. Today, arguably the core of the most powerful prediction methods for predicting RSA and other structural features of proteins is some form of deep learning, and all the state-of-the-art protein structure prediction tools rely on some machine learning algorithm. In this article we present a deep neural network architecture composed of stacks of bidirectional recurrent neural networks and convolutional layers which is capable of mining information from long-range interactions within a protein sequence and apply it to the prediction of protein RSA using a novel encoding method that we shall call "clipped". The final system we present, PaleAle 5.0, which is available as a public server, predicts RSA into two, three and four classes at an accuracy exceeding 80% in two classes, surpassing the performances of all the other predictors we have benchmarked.
Collapse
|
47
|
Wardah W, Khan M, Sharma A, Rashid MA. Protein secondary structure prediction using neural networks and deep learning: A review. Comput Biol Chem 2019; 81:1-8. [DOI: 10.1016/j.compbiolchem.2019.107093] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 12/28/2018] [Accepted: 07/10/2019] [Indexed: 02/02/2023]
|
48
|
Kashani-Amin E, Sakhteman A, Larijani B, Ebrahim-Habibi A. Introducing a New Model of Sweet Taste Receptor, a Class C G-protein Coupled Receptor (C GPCR). Cell Biochem Biophys 2019; 77:227-243. [PMID: 31069640 DOI: 10.1007/s12013-019-00872-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2018] [Accepted: 04/27/2019] [Indexed: 12/31/2022]
Abstract
The structure of sweet taste receptor (STR), a heterodimer of class C G-protein coupled receptors comprising T1R2 and T1R3 molecules, is still undetermined. In this study, a new enhanced model of the receptor is introduced based on the most recent templates. The improvement, stability, and reliability of the model are discussed in details. Each domain of the protein, i.e., VFTM, CR, and TMD, were separately constructed by hybrid-model construction methods and then assembled to build whole monomers. Overall, 680 ns molecular dynamics simulation was performed for the individual domains, the whole monomers and the heterodimer form of the VFTM orthosteric binding site. The latter's structure obtained from 200 ns simulation was docked with aspartame; among various binding sites suggested by FTMAP server, the experimentally suggested binding domain in T1R2 was retrieved. Local three-dimensional structures and helices spans were evaluated and showed acceptable accordance with the template structures and secondary structure predictions. Individual domains and whole monomer structures were found stable and reliable to be used. In conclusion, several validations have shown reliability of the new and enhanced models for further molecular modeling studies on structure and function of STR and C GPCRs.
Collapse
Affiliation(s)
- Elaheh Kashani-Amin
- Biosensor Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Amirhossein Sakhteman
- Department of Medicinal Chemistry, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran.,Medicinal Chemistry and Natural Products Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Bagher Larijani
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Azadeh Ebrahim-Habibi
- Biosensor Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
49
|
Unifying structural signature of eukaryotic α-helical host defense peptides. Proc Natl Acad Sci U S A 2019; 116:6944-6953. [PMID: 30877253 DOI: 10.1073/pnas.1819250116] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Diversity of α-helical host defense peptides (αHDPs) contributes to immunity against a broad spectrum of pathogens via multiple functions. Thus, resolving common structure-function relationships among αHDPs is inherently difficult, even for artificial-intelligence-based methods that seek multifactorial trends rather than foundational principles. Here, bioinformatic and pattern recognition methods were applied to identify a unifying signature of eukaryotic αHDPs derived from amino acid sequence, biochemical, and three-dimensional properties of known αHDPs. The signature formula contains a helical domain of 12 residues with a mean hydrophobic moment of 0.50 and favoring aliphatic over aromatic hydrophobes in 18-aa windows of peptides or proteins matching its semantic definition. The holistic α-core signature subsumes existing physicochemical properties of αHDPs, and converged strongly with predictions of an independent machine-learning-based classifier recognizing sequences inducing negative Gaussian curvature in target membranes. Queries using the α-core formula identified 93% of all annotated αHDPs in proteomic databases and retrieved all major αHDP families. Synthesis and antimicrobial assays confirmed efficacies of predicted sequences having no previously known antimicrobial activity. The unifying α-core signature establishes a foundational framework for discovering and understanding αHDPs encompassing diverse structural and mechanistic variations, and affords possibilities for deterministic design of antiinfectives.
Collapse
|
50
|
O’Brien KT, Golla K, Kranjc T, O’Donovan D, Allen S, Maguire P, Simpson JC, O’Connell D, Moran N, Shields DC. Computational and experimental analysis of bioactive peptide linear motifs in the integrin adhesome. PLoS One 2019; 14:e0210337. [PMID: 30689642 PMCID: PMC6349357 DOI: 10.1371/journal.pone.0210337] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 12/20/2018] [Indexed: 12/15/2022] Open
Abstract
Therapeutic modulation of protein interactions is challenging, but short linear motifs (SLiMs) represent potential targets. Focal adhesions play a central role in adhesion by linking cells to the extracellular matrix. Integrins are central to this process, and many other intracellular proteins are components of the integrin adhesome. We applied a peptide network targeting approach to explore the intracellular modulation of integrin function in platelets. Firstly, we computed a platelet-relevant integrin adhesome, inferred via homology of known platelet proteins to adhesome components. We then computationally selected peptides from the set of platelet integrin adhesome cytoplasmic and membrane adjacent protein-protein interfaces. Motifs of interest in the intracellular component of the platelet integrin adhesome were identified using a predictor of SLiMs based on analysis of protein primary amino acid sequences (SLiMPred), a predictor of strongly conserved motifs within disordered protein regions (SLiMPrints), and information from the literature regarding protein interactions in the complex. We then synthesized peptides incorporating these motifs combined with cell penetrating factors (tat peptide and palmitylation for cytoplasmic and membrane proteins respectively). We tested for the platelet activating effects of the peptides, as well as their abilities to inhibit activation. Bioactivity testing revealed a number of peptides that modulated platelet function, including those derived from α-actinin (ACTN1) and syndecan (SDC4), binding to vinculin and syntenin respectively. Both chimeric peptide experiments and peptide combination experiments failed to identify strong effects, perhaps characterizing the adhesome as relatively robust against within-adhesome synergistic perturbation. We investigated in more detail peptides targeting vinculin. Combined experimental and computational evidence suggested a model in which the positively charged tat-derived cell penetrating part of the peptide contributes to bioactivity via stabilizing charge interactions with a region of the ACTN1 negatively charged surface. We conclude that some interactions in the integrin adhesome appear to be capable of modulation by short peptides, and may aid in the identification and characterization of target sites within the complex that may be useful for therapeutic modulation.
Collapse
Affiliation(s)
- Kevin T. O’Brien
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Kalyan Golla
- Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Tilen Kranjc
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biology and Environment Science, University College Dublin, Dublin, Ireland
| | - Darragh O’Donovan
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Seamus Allen
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Patricia Maguire
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Jeremy C. Simpson
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biology and Environment Science, University College Dublin, Dublin, Ireland
| | - David O’Connell
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Niamh Moran
- Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Denis C. Shields
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
- * E-mail:
| |
Collapse
|