1
|
Mondal A, Singh B, Felkner RH, Falco AD, Swapna GVT, Montelione GT, Roth MJ, Perez A. A Computational Pipeline for Accurate Prioritization of Protein-Protein Binding Candidates in High-Throughput Protein Libraries. Angew Chem Int Ed Engl 2024; 63:e202405767. [PMID: 38588243 PMCID: PMC11544546 DOI: 10.1002/anie.202405767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 04/05/2024] [Accepted: 04/08/2024] [Indexed: 04/10/2024]
Abstract
Identifying the interactome for a protein of interest is challenging due to the large number of possible binders. High-throughput experimental approaches narrow down possible binding partners but often include false positives. Furthermore, they provide no information about what the binding region is (e.g., the binding epitope). We introduce a novel computational pipeline based on an AlphaFold2 (AF) Competitive Binding Assay (AF-CBA) to identify proteins that bind a target of interest from a pull-down experiment and the binding epitope. Our focus is on proteins that bind the Extraterminal (ET) domain of Bromo and Extraterminal domain (BET) proteins, but we also introduce nine additional systems to show transferability to other peptide-protein systems. We describe a series of limitations to the methodology based on intrinsic deficiencies of AF and AF-CBA to help users identify scenarios where the approach will be most useful. Given the method's speed and accuracy, we anticipate its broad applicability to identify binding epitope regions among potential partners, setting the stage for experimental verification.
Collapse
Affiliation(s)
- Arup Mondal
- Department of Chemistry and Quantum Theory Project, University of Florida, Leigh Hall 240, Gainesville, FL
| | - Bhumika Singh
- Department of Chemistry and Quantum Theory Project, University of Florida, Leigh Hall 240, Gainesville, FL
| | - Roland H. Felkner
- Department of Pharmacology, Rutgers-Robert Wood Johnson Medical School, 675 Hoes Lane Rm 636, Piscataway, NJ 08854
| | - Anna De Falco
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - GVT Swapna
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - Gaetano T. Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - Monica J. Roth
- Department of Pharmacology, Rutgers-Robert Wood Johnson Medical School, 675 Hoes Lane Rm 636, Piscataway, NJ 08854
| | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Leigh Hall 240, Gainesville, FL
| |
Collapse
|
2
|
Liu Z, Bai T, Liu B, Yu L. MulStack: An ensemble learning prediction model of multilabel mRNA subcellular localization. Comput Biol Med 2024; 175:108289. [PMID: 38688123 DOI: 10.1016/j.compbiomed.2024.108289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 02/28/2024] [Accepted: 03/12/2024] [Indexed: 05/02/2024]
Abstract
Subcellular localization of mRNA is related to protein synthesis, cell polarity, cell movement and other biological regulation mechanisms. The distribution of mRNAs in subcellulars is similar to that of proteins, and most mRNAs are distributed in multiple subcellulars. Recently, some computational methods have been designed to predict the subcellular localization of mRNA. However, these methods only employed a sin-gle level of mRNA features and did not employ the position encoding of nucleotides in mRNA. In this paper, an ensemble learning prediction model is proposed, named MulStack, which is based on random forest and deep learning for multilabel mRNA subcellular localization. The proposed method employs two levels of mRNA features, including sequence-level and residue-level features, and position encoding is employed for the first time in the field of subcellular localization of mRNA. Random forest is employed to learn mRNA sequence-level feature, deep learning is employed to learn mRNA sequence-level feature and mRNA residue-level combined with position encoding. And the outputs of random forest and deep learning model will be weighted sum as the prediction probability. Compared with existing methods, the results show that MulStack is the best in the localization of the nucleus, cytosol and exosome. In addition, position weight matrices (PWMs) are extracted by convolutional neural networks (CNNs) that can be matched with known RNA binding protein motifs. Gene ontology (GO) enrichment analysis shows biological processes, molecular functions and cellular components of mRNA genes. The prediction web server of MulStack is freely accessible at http://bliulab.net/MulStack.
Collapse
Affiliation(s)
- Ziqi Liu
- School of Computer Science and Technology, Xidian University, Xian, 710075, China.
| | - Tao Bai
- School of Mathematics & Computer Science, Yan'an University, Shaanxi, 716000, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xian, 710075, China.
| |
Collapse
|
3
|
Emami N, Ferdousi R. HormoNet: a deep learning approach for hormone-drug interaction prediction. BMC Bioinformatics 2024; 25:87. [PMID: 38418979 PMCID: PMC10903040 DOI: 10.1186/s12859-024-05708-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 02/16/2024] [Indexed: 03/02/2024] Open
Abstract
Several experimental evidences have shown that the human endogenous hormones can interact with drugs in many ways and affect drug efficacy. The hormone drug interactions (HDI) are essential for drug treatment and precision medicine; therefore, it is essential to understand the hormone-drug associations. Here, we present HormoNet to predict the HDI pairs and their risk level by integrating features derived from hormone and drug target proteins. To the best of our knowledge, this is one of the first attempts to employ deep learning approach for prediction of HDI prediction. Amino acid composition and pseudo amino acid composition were applied to represent target information using 30 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied synthetic minority over-sampling technique technique. Additionally, we constructed novel datasets for HDI prediction and the risk level of their interaction. HormoNet achieved high performance on our constructed hormone-drug benchmark datasets. The results provide insights into the understanding of the relationship between hormone and a drug, and indicate the potential benefit of reducing risk levels of interactions in designing more effective therapies for patients in drug treatments. Our benchmark datasets and the source codes for HormoNet are available in: https://github.com/EmamiNeda/HormoNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
4
|
Mondal A, Singh B, Felkner RH, De Falco A, Swapna GVT, Montelione GT, Roth MJ, Perez A. Sifting Through the Noise: A Computational Pipeline for Accurate Prioritization of Protein-Protein Binding Candidates in High-Throughput Protein Libraries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576374. [PMID: 38328039 PMCID: PMC10849530 DOI: 10.1101/2024.01.20.576374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Identifying the interactome for a protein of interest is challenging due to the large number of possible binders. High-throughput experimental approaches narrow down possible binding partners, but often include false positives. Furthermore, they provide no information about what the binding region is (e.g. the binding epitope). We introduce a novel computational pipeline based on an AlphaFold2 (AF) Competition Assay (AF-CBA) to identify proteins that bind a target of interest from a pull-down experiment, along with the binding epitope. Our focus is on proteins that bind the Extraterminal (ET) domain of Bromo and Extraterminal domain (BET) proteins, but we also introduce nine additional systems to show transferability to other peptide-protein systems. We describe a series of limitations to the methodology based on intrinsic deficiencies to AF and AF-CBA, to help users identify scenarios where the approach will be most useful. Given the speed and accuracy of the methodology, we expect it to be generally applicable to facilitate target selection for experimental verification starting from high-throughput protein libraries.
Collapse
Affiliation(s)
- Arup Mondal
- Department of Chemistry and Quantum Theory Project, University of Florida, Leigh Hall 240, Gainesville, FL
| | - Bhumika Singh
- Department of Chemistry and Quantum Theory Project, University of Florida, Leigh Hall 240, Gainesville, FL
| | - Roland H. Felkner
- Department of Pharmacology, Rutgers-Robert Wood Johnson Medical School, 675 Hoes Lane Rm 636, Piscataway, NJ 08854
| | - Anna De Falco
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - GVT Swapna
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - Gaetano T. Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - Monica J. Roth
- Department of Pharmacology, Rutgers-Robert Wood Johnson Medical School, 675 Hoes Lane Rm 636, Piscataway, NJ 08854
| | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Leigh Hall 240, Gainesville, FL
| |
Collapse
|
5
|
Converting the genomic knowledge base to build protein specific machine learning prediction models; a classification study on thermophilic serine protease. Biologia (Bratisl) 2022. [DOI: 10.1007/s11756-022-01214-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Emami N, Ferdousi R. AptaNet as a deep learning approach for aptamer-protein interaction prediction. Sci Rep 2021; 11:6074. [PMID: 33727685 PMCID: PMC7971039 DOI: 10.1038/s41598-021-85629-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 03/03/2021] [Indexed: 02/08/2023] Open
Abstract
Aptamers are short oligonucleotides (DNA/RNA) or peptide molecules that can selectively bind to their specific targets with high specificity and affinity. As a powerful new class of amino acid ligands, aptamers have high potentials in biosensing, therapeutic, and diagnostic fields. Here, we present AptaNet-a new deep neural network-to predict the aptamer-protein interaction pairs by integrating features derived from both aptamers and the target proteins. Aptamers were encoded by using two different strategies, including k-mer and reverse complement k-mer frequency. Amino acid composition (AAC) and pseudo amino acid composition (PseAAC) were applied to represent target information using 24 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied a neighborhood cleaning algorithm. The predictor was constructed based on a deep neural network, and optimal features were selected using the random forest algorithm. As a result, 99.79% accuracy was achieved for the training dataset, and 91.38% accuracy was obtained for the testing dataset. AptaNet achieved high performance on our constructed aptamer-protein benchmark dataset. The results indicate that AptaNet can help identify novel aptamer-protein interacting pairs and build more-efficient insights into the relationship between aptamers and proteins. Our benchmark dataset and the source codes for AptaNet are available in: https://github.com/nedaemami/AptaNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
7
|
Das S, Chakrabarti S. Classification and prediction of protein-protein interaction interface using machine learning algorithm. Sci Rep 2021; 11:1761. [PMID: 33469042 PMCID: PMC7815773 DOI: 10.1038/s41598-020-80900-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 12/15/2020] [Indexed: 01/29/2023] Open
Abstract
Structural insight of the protein-protein interaction (PPI) interface can provide knowledge about the kinetics, thermodynamics and molecular functions of the complex while elucidating its role in diseases and further enabling it as a potential therapeutic target. However, owing to experimental lag in solving protein-protein complex structures, three-dimensional (3D) knowledge of the PPI interfaces can be gained via computational approaches like molecular docking and post-docking analyses. Despite development of numerous docking tools and techniques, success in identification of native like interfaces based on docking score functions is limited. Hence, we employed an in-depth investigation of the structural features of the interface that might successfully delineate native complexes from non-native ones. We identify interface properties, which show statistically significant difference between native and non-native interfaces belonging to homo and hetero, protein-protein complexes. Utilizing these properties, a support vector machine (SVM) based classification scheme has been implemented to differentiate native and non-native like complexes generated using docking decoys. Benchmarking and comparative analyses suggest very good performance of our SVM classifiers. Further, protein interactions, which are proven via experimental findings but not resolved structurally, were subjected to this approach where 3D-models of the complexes were generated and most likely interfaces were predicted. A web server called Protein Complex Prediction by Interface Properties (PCPIP) is developed to predict whether interface of a given protein-protein dimer complex resembles known protein interfaces. The server is freely available at http://www.hpppi.iicb.res.in/pcpip/ .
Collapse
Affiliation(s)
- Subhrangshu Das
- grid.417635.20000 0001 2216 5074Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, WB India
| | - Saikat Chakrabarti
- grid.417635.20000 0001 2216 5074Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, WB India
| |
Collapse
|
8
|
Adolf-Bryfogle J, Teets FD, Bahl CD. Toward complete rational control over protein structure and function through computational design. Curr Opin Struct Biol 2020; 66:170-177. [PMID: 33276237 DOI: 10.1016/j.sbi.2020.10.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 10/08/2020] [Accepted: 10/19/2020] [Indexed: 11/28/2022]
Abstract
The grand challenge of protein design is a general method for producing a polypeptide with arbitrary functionality, conformation, and biochemical properties. To that end, a wide variety of methods have been developed for the improvement of native proteins, the design of ideal proteins de novo, and the redesign of suboptimal proteins with better-performing substructures. These methods employ informatic comparisons of function-structure-sequence relationships as well as knowledge-based evaluation of protein properties to narrow the immense protein sequence search space down to an enumerable and often manually evaluable set of structures that meet specified criteria. While arbitrary manipulation of protein-protein interfaces and molecular catalysis remains an unsolved problem, and no protein shape or behavior manipulation algorithm is universally applicable, the promising results thus far are a strong indicator that a general approach to the arbitrary manipulation of polypeptides is within reach.
Collapse
Affiliation(s)
- Jared Adolf-Bryfogle
- Institute for Protein Innovation, Boston, MA 02115, USA; Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Frank D Teets
- Institute for Protein Innovation, Boston, MA 02115, USA; Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Christopher D Bahl
- Institute for Protein Innovation, Boston, MA 02115, USA; Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
9
|
Wang C, Zhao N, Sun K, Zhang Y. A Cancer Gene Module Mining Method Based on Bio-Network of Multi-Omics Gene Groups. Front Oncol 2020; 10:1159. [PMID: 32637361 PMCID: PMC7317001 DOI: 10.3389/fonc.2020.01159] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 06/08/2020] [Indexed: 11/13/2022] Open
Abstract
The initiation, promotion and progression of cancer are highly associated to the environment a human lives in as well as individual genetic factors. In view of the dangers to life and health caused by this abnormally complex systemic disease, many top scientific research institutions around the world have been actively carrying out research in order to discover the pathogenic mechanisms driving cancer occurrence and development. The emergence of high-throughput sequencing technology has greatly advanced oncology research and given rise to the revelation of important oncogenes and the interrelationship among them. Here, we have studied heterogeneous multi-level data within a context of integrated data, and scientifically introduced lncRNA omics data to construct multi-omics bio-network models, allowing the screening of key cancer-related gene groups. We propose a compactness clustering algorithm based on corrected cumulative rank scores, which uses the functional similarity between groups of genes as a distance measure to excavate key gene modules for abnormal regulation contained in gene groups through clustering. We also conducted a survival analysis using our results and found that our model could divide groups of different levels very well. The results also demonstrate that the integration of multi-omics biological data, key gene modules and their dysregulated gene groups can be discovered, which is crucial for cancer research.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ning Zhao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Kai Sun
- Thoracic Surgery Department, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| |
Collapse
|