1
|
Martin J. AlphaFold2 Predicts Whether Proteins Interact Amidst Confounding Structural Compatibility. J Chem Inf Model 2024; 64:1473-1480. [PMID: 38373070 DOI: 10.1021/acs.jcim.3c01805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Predicting whether two proteins physically interact is one of the holy grails of computational biology, galvanized by rapid advancements in deep learning. AlphaFold2, although not developed with this goal, is promising in this respect. Here, I test the prediction capability of AlphaFold2 on a very challenging data set, where proteins are structurally compatible, even when they do not interact. AlphaFold2 achieves high discrimination between interacting and non-interacting proteins, and the cases of misclassifications can either be rescued by revisiting the input sequences or can suggest false positives and negatives in the data set. AlphaFold2 is thus not impaired by the compatibility between protein structures and has the potential to be applied on a large scale.
Collapse
Affiliation(s)
- Juliette Martin
- Univ Lyon, CNRS, UMR 5086 MMSB, 7 passage du Vercors F-69367, Lyon, France
- Laboratory of Biology and Modeling of the Cell, Ecole Normale Supérieure de Lyon, CNRS UMR 5239, Inserm U1293, University Claude Bernard Lyon 1, 69364, Lyon, France
| |
Collapse
|
2
|
Protein-protein interaction and non-interaction predictions using gene sequence natural vector. Commun Biol 2022; 5:652. [PMID: 35780196 PMCID: PMC9250521 DOI: 10.1038/s42003-022-03617-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/21/2022] [Indexed: 12/02/2022] Open
Abstract
Predicting protein–protein interaction and non-interaction are two important different aspects of multi-body structure predictions, which provide vital information about protein function. Some computational methods have recently been developed to complement experimental methods, but still cannot effectively detect real non-interacting protein pairs. We proposed a gene sequence-based method, named NVDT (Natural Vector combine with Dinucleotide and Triplet nucleotide), for the prediction of interaction and non-interaction. For protein–protein non-interactions (PPNIs), the proposed method obtained accuracies of 86.23% for Homo sapiens and 85.34% for Mus musculus, and it performed well on three types of non-interaction networks. For protein-protein interactions (PPIs), we obtained accuracies of 99.20, 94.94, 98.56, 95.41, and 94.83% for Saccharomyces cerevisiae, Drosophila melanogaster, Helicobacter pylori, Homo sapiens, and Mus musculus, respectively. Furthermore, NVDT outperformed established sequence-based methods and demonstrated high prediction results for cross-species interactions. NVDT is expected to be an effective approach for predicting PPIs and PPNIs. Protein-protein non-interactions and interactions are distinguished and predicted by gene sequence using single nucleotide and contiguous nucleotides combined with machine learning models.
Collapse
|
3
|
Cotranslational interaction of human EBP50 and ezrin overcomes masked binding site during complex assembly. Proc Natl Acad Sci U S A 2022; 119:2115799119. [PMID: 35140182 PMCID: PMC8851480 DOI: 10.1073/pnas.2115799119] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2021] [Indexed: 12/13/2022] Open
Abstract
Multiprotein assemblages are the intracellular workhorses of many physiological processes. Assembly of constituents into complexes can be driven by stochastic, domain-dependent, posttranslational events in which mature, folded proteins specifically interact. However, inaccessibility of interacting surfaces in mature proteins (e.g., due to "buried" domains) can obstruct complex formation. Mechanisms by which multiprotein complex constituents overcome topological impediments remain enigmatic. For example, the heterodimeric complex formed by EBP50 and ezrin must address this issue as the EBP50-interacting domain in ezrin is obstructed by a self-interaction that occupies the EBP50 binding site. Here, we show that the EBP50-ezrin complex is formed by a cotranslational mechanism in which the C terminus of mature, fully formed EBP50 binds the emerging, ribosome-bound N-terminal FERM domain of ezrin during EZR mRNA translation. Consistent with this observation, a C-terminal EBP50 peptide mimetic reduces the cotranslational interaction and abrogates EBP50-ezrin complex formation. Phosphorylation of EBP50 at Ser339 and Ser340 abrogates the cotranslational interaction and inhibits complex formation. In summary, we show that the function of eukaryotic mRNA translation extends beyond "simple" generation of a linear peptide chain that folds into a tertiary structure, potentially for subsequent complex assembly; importantly, translation can facilitate interactions with sterically inaccessible domains to form functional multiprotein complexes.
Collapse
|
4
|
One of Nature’s Basic Laws: Combination-Sharing. HUMAN ARENAS 2021. [DOI: 10.1007/s42087-021-00215-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
5
|
Nath A, Leier A. Improved cytokine-receptor interaction prediction by exploiting the negative sample space. BMC Bioinformatics 2020; 21:493. [PMID: 33129275 PMCID: PMC7603689 DOI: 10.1186/s12859-020-03835-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 10/23/2020] [Indexed: 01/19/2023] Open
Abstract
Background Cytokines act by binding to specific receptors in the plasma membrane of target cells. Knowledge of cytokine–receptor interaction (CRI) is very important for understanding the pathogenesis of various human diseases—notably autoimmune, inflammatory and infectious diseases—and identifying potential therapeutic targets. Recently, machine learning algorithms have been used to predict CRIs. “Gold Standard” negative datasets are still lacking and strong biases in negative datasets can significantly affect the training of learning algorithms and their evaluation. To mitigate the unrepresentativeness and bias inherent in the negative sample selection (non-interacting proteins), we propose a clustering-based approach for representative negative sample selection. Results We used deep autoencoders to investigate the effect of different sampling approaches for non-interacting pairs on the training and the performance of machine learning classifiers. By using the anomaly detection capabilities of deep autoencoders we deduced the effects of different categories of negative samples on the training of learning algorithms. Random sampling for selecting non-interacting pairs results in either over- or under-representation of hard or easy to classify instances. When K-means based sampling of negative datasets is applied to mitigate the inadequacies of random sampling, random forest (RF) together with the combined feature set of atomic composition, physicochemical-2grams and two different representations of evolutionary information performs best. Average model performances based on leave-one-out cross validation (loocv) over ten different negative sample sets that each model was trained with, show that RF models significantly outperform the previous best CRI predictor in terms of accuracy (+ 5.1%), specificity (+ 13%), mcc (+ 0.1) and g-means value (+ 5.1). Evaluations using tenfold cv and training/testing splits confirm the competitive performance. Conclusions A comparative analysis was performed to assess the effect of three different sampling methods (random, K-means and uniform sampling) on the training of learning algorithms using different evaluation methods. Models trained on K-means sampled datasets generally show a significantly improved performance compared to those trained on random selections—with RF seemingly benefiting most in our particular setting. Our findings on the sampling are highly relevant and apply to many applications of supervised learning approaches in bioinformatics.
Collapse
Affiliation(s)
- Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India.
| | - André Leier
- Department of Genetics, Department of Cell Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
6
|
Lawson T, Lycett GW, Mayes S, Ho WK, Chin CF. Transcriptome-wide identification and characterization of the Rab GTPase family in mango. Mol Biol Rep 2020; 47:4183-4197. [PMID: 32444976 DOI: 10.1007/s11033-020-05519-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 05/14/2020] [Indexed: 10/24/2022]
Abstract
The Rab GTPase family plays a vital role in several plant physiological processes including fruit ripening. Fruit softening during ripening involves trafficking of cell wall polymers and enzymes between cellular compartments. Mango, an economically important fruit crop, is known for its delicious taste, exotic flavour and nutritional value. So far, there is a paucity of information on the mango Rab GTPase family. In this study, 23 genes encoding Rab proteins were identified in mango by a comprehensive in silico approach. Sequence alignment and similarity tree analysis with the model plant Arabidopsis as a reference enabled the bona fide assignment of the deduced mango proteins to classify into eight subfamilies. Expression analysis by RNA-Sequencing (RNA-Seq) showed that the Rab genes were differentially expressed in ripe and unripe mangoes suggesting the involvement of vesicle trafficking during ripening. Interaction analysis showed that the proteins involved in vesicle trafficking and cell wall softening were interconnected providing further evidence of the involvement of the Rab GTPases in fruit softening. Correlation analyses showed a significant relationship between the expression level of the RabA3 and RabA4 genes and fruit firmness at the unripe stage of the mango varieties suggesting that the differences in gene expression level might be associated with the contrasting firmness of these varieties. This study will not only provide new insights into the complexity of the ripening-regulated molecular mechanism but also facilitate the identification of potential Rab GTPases to address excessive fruit softening.
Collapse
Affiliation(s)
- Tamunonengiyeofori Lawson
- School of Biosciences, Faculty of Science, The University of Nottingham, Jalan Broga, 43500, Semenyih, Selangor Darul Ehsan, Malaysia.,Division of Plant and Crop Sciences, School of Biosciences, The University of Nottingham, Sutton Bonington Campus, Loughborough, LE12 5RD, UK.,Crops for the Future (CFF) Jalan Broga, 43500, Semenyih, Selangor Darul Ehsan, Malaysia
| | - Grantley W Lycett
- Division of Plant and Crop Sciences, School of Biosciences, The University of Nottingham, Sutton Bonington Campus, Loughborough, LE12 5RD, UK
| | - Sean Mayes
- Division of Plant and Crop Sciences, School of Biosciences, The University of Nottingham, Sutton Bonington Campus, Loughborough, LE12 5RD, UK.,Crops for the Future (CFF) Jalan Broga, 43500, Semenyih, Selangor Darul Ehsan, Malaysia
| | - Wai Kuan Ho
- School of Biosciences, Faculty of Science, The University of Nottingham, Jalan Broga, 43500, Semenyih, Selangor Darul Ehsan, Malaysia.,Crops for the Future (CFF) Jalan Broga, 43500, Semenyih, Selangor Darul Ehsan, Malaysia
| | - Chiew Foan Chin
- School of Biosciences, Faculty of Science, The University of Nottingham, Jalan Broga, 43500, Semenyih, Selangor Darul Ehsan, Malaysia.
| |
Collapse
|