1
|
Liu YC, Lin YJ, Chang YY, Chuang CC, Ou YY. Deciphering the Language of Protein-DNA Interactions: A Deep Learning Approach Combining Contextual Embeddings and Multi-Scale Sequence Modeling. J Mol Biol 2024; 436:168769. [PMID: 39214282 DOI: 10.1016/j.jmb.2024.168769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 08/01/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024]
Abstract
Deciphering the mechanisms governing protein-DNA interactions is crucial for understanding key cellular processes and disease pathways. In this work, we present a powerful deep learning approach that significantly advances the computational prediction of DNA-interacting residues from protein sequences. Our method leverages the rich contextual representations learned by pre-trained protein language models, such as ProtTrans, to capture intrinsic biochemical properties and sequence motifs indicative of DNA binding sites. We then integrate these contextual embeddings with a multi-window convolutional neural network architecture, which scans across the sequence at varying window sizes to effectively identify both local and global binding patterns. Comprehensive evaluation on curated benchmark datasets demonstrates the remarkable performance of our approach, achieving an area under the ROC curve (AUC) of 0.89 - a substantial improvement over previous state-of-the-art sequence-based predictors. This showcases the immense potential of pairing advanced representation learning and deep neural network designs for uncovering the complex syntax governing protein-DNA interactions directly from primary sequences. Our work not only provides a robust computational tool for characterizing DNA-binding mechanisms, but also highlights the transformative opportunities at the intersection of language modeling, deep learning, and protein sequence analysis. The publicly available code and data further facilitate broader adoption and continued development of these techniques for accelerating mechanistic insights into vital biological processes and disease pathways. In addition, the code and data for this work are available at https://github.com/B1607/DIRP.
Collapse
Affiliation(s)
- Yu-Chen Liu
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Yi-Jing Lin
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Yan-Yun Chang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Cheng-Che Chuang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li 32003, Taiwan.
| |
Collapse
|
2
|
Wang S, Dong K, Liang D, Zhang Y, Li X, Song T. MIPPIS: protein-protein interaction site prediction network with multi-information fusion. BMC Bioinformatics 2024; 25:345. [PMID: 39497043 PMCID: PMC11536593 DOI: 10.1186/s12859-024-05964-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 10/21/2024] [Indexed: 11/06/2024] Open
Abstract
BACKGROUND The prediction of protein-protein interaction sites plays a crucial role in biochemical processes. Investigating the interaction between viruses and receptor proteins through biological techniques aids in understanding disease mechanisms and guides the development of corresponding drugs. While various methods have been proposed in the past, they often suffer from drawbacks such as long processing times, high costs, and low accuracy. RESULTS Addressing these challenges, we propose a novel protein-protein interaction site prediction network based on multi-information fusion. In our approach, the initial amino acid features are depicted by the position-specific scoring matrix, hidden Markov model, dictionary of protein secondary structure, and one-hot encoding. Simultaneously, we adopt a multi-channel approach to extract deep-level amino acids features from different perspectives. The graph convolutional network channel effectively extracts spatial structural information. The bidirectional long short-term memory channel treats the amino acid sequence as natural language, capturing the protein's primary structure information. The ProtT5 protein large language model channel outputs a more comprehensive amino acid embedding representation, providing a robust complement to the two aforementioned channels. Finally, the obtained amino acid features are fed into the prediction layer for the final prediction. CONCLUSION Compared with six protein structure-based methods and six protein sequence-based methods, our model achieves optimal performance across evaluation metrics, including accuracy, precision, F1, Matthews correlation coefficient, and area under the precision recall curve, which demonstrates the superiority of our model.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China
| | - Kaiyu Dong
- College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China
| | - Dingming Liang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China
| | - Yunjing Zhang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China
| | - Xue Li
- College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China.
- Department of Artificial Intelligence, Polytechnical University of Madrid, Madrid, 28031, Spain.
| |
Collapse
|
3
|
Daanial Khan Y, Alkhalifah T, Alturise F, Hassan Butt A. DeepDBS: Identification of DNA-binding sites in protein sequences by using deep representations and random forest. Methods 2024; 231:26-36. [PMID: 39270885 DOI: 10.1016/j.ymeth.2024.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 08/26/2024] [Accepted: 09/04/2024] [Indexed: 09/15/2024] Open
Abstract
Interactions of biological molecules in organisms are considered to be primary factors for the lifecycle of that organism. Various important biological functions are dependent on such interactions and among different kinds of interactions, the protein DNA interactions are very important for the processes of transcription, regulation of gene expression, DNA repairing and packaging. Thus, keeping the knowledge of such interactions and the sites of those interactions is necessary to study the mechanism of various biological processes. As experimental identification through biological assays is quite resource-demanding, costly and error-prone, scientists opt for the computational methods for efficient and accurate identification of such DNA-protein interaction sites. Thus, herein, we propose a novel and accurate method namely DeepDBS for the identification of DNA-binding sites in proteins, using primary amino acid sequences of proteins under study. From protein sequences, deep representations were computed through a one-dimensional convolution neural network (1D-CNN), recurrent neural network (RNN) and long short-term memory (LSTM) network and were further used to train a Random Forest classifier. Random Forest with LSTM-based features outperformed the other models, as well as the existing state-of-the-art methods with an accuracy score of 0.99 for self-consistency test, 10-fold cross-validation, 5-fold cross-validation, and jackknife validation while 0.92 for independent dataset testing. It is concluded based on results that the DeepDBS can help accurate and efficient identification of DNA binding sites (DBS) in proteins.
Collapse
Affiliation(s)
- Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Punjab 54770, Pakistan
| | - Tamim Alkhalifah
- Department of Computer Engineering, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Fahad Alturise
- Department of Cybersecurity, College of Computer, Qassim University, Buraydah 52571, Saudi Arabia
| | - Ahmad Hassan Butt
- Department of Computer Science, Faculty of Computing and Information Technology, University of the Punjab, Lahore 54000, Punjab, Pakistan.
| |
Collapse
|
4
|
Li Y, Nan X, Zhang S, Zhou Q, Lu S, Tian Z. PMSFF: Improved Protein Binding Residues Prediction through Multi-Scale Sequence-Based Feature Fusion Strategy. Biomolecules 2024; 14:1220. [PMID: 39456153 PMCID: PMC11506650 DOI: 10.3390/biom14101220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/22/2024] [Accepted: 09/24/2024] [Indexed: 10/28/2024] Open
Abstract
Proteins perform different biological functions through binding with various molecules which are mediated by a few key residues and accurate prediction of such protein binding residues (PBRs) is crucial for understanding cellular processes and for designing new drugs. Many computational prediction approaches have been proposed to identify PBRs with sequence-based features. However, these approaches face two main challenges: (1) these methods only concatenate residue feature vectors with a simple sliding window strategy, and (2) it is challenging to find a uniform sliding window size suitable for learning embeddings across different types of PBRs. In this study, we propose one novel framework that could apply multiple types of PBRs Prediciton task through Multi-scale Sequence-based Feature Fusion (PMSFF) strategy. Firstly, PMSFF employs a pre-trained language model named ProtT5, to encode amino acid residues in protein sequences. Then, it generates multi-scale residue embeddings by applying multi-size windows to capture effective neighboring residues and multi-size kernels to learn information across different scales. Additionally, the proposed model treats protein sequences as sentences, employing a bidirectional GRU to learn global context. We also collect benchmark datasets encompassing various PBRs types and evaluate our PMSFF approach to these datasets. Compared with state-of-the-art methods, PMSFF demonstrates superior performance on most PBRs prediction tasks.
Collapse
Affiliation(s)
- Yuguang Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Xiaofei Nan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Shoutao Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Longhu Laboratory of Advanced Immunology, Zhengzhou 450001, China
| | - Qinglei Zhou
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Shuai Lu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
- National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, China
| | - Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| |
Collapse
|
5
|
Zhang B, Hou Z, Yang Y, Wong KC, Zhu H, Li X. SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues. Commun Biol 2024; 7:679. [PMID: 38830995 PMCID: PMC11148103 DOI: 10.1038/s42003-024-06332-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 05/15/2024] [Indexed: 06/05/2024] Open
Abstract
Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .
Collapse
Affiliation(s)
- Bin Zhang
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Zilong Hou
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Haoran Zhu
- School of Artificial Intelligence, Jilin University, Changchun, China.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun, China.
| |
Collapse
|
6
|
Sagendorf JM, Mitra R, Huang J, Chen XS, Rohs R. Structure-based prediction of protein-nucleic acid binding using graph neural networks. Biophys Rev 2024; 16:297-314. [PMID: 39345796 PMCID: PMC11427629 DOI: 10.1007/s12551-024-01201-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 05/28/2024] [Indexed: 10/01/2024] Open
Abstract
Protein-nucleic acid (PNA) binding plays critical roles in the transcription, translation, regulation, and three-dimensional organization of the genome. Structural models of proteins bound to nucleic acids (NA) provide insights into the chemical, electrostatic, and geometric properties of the protein structure that give rise to NA binding but are scarce relative to models of unbound proteins. We developed a deep learning approach for predicting PNA binding given the unbound structure of a protein that we call PNAbind. Our method utilizes graph neural networks to encode the spatial distribution of physicochemical and geometric properties of protein structures that are predictive of NA binding. Using global physicochemical encodings, our models predict the overall binding function of a protein, and using local encodings, they predict the location of individual NA binding residues. Our models can discriminate between specificity for DNA or RNA binding, and we show that predictions made on computationally derived protein structures can be used to gain mechanistic understanding of chemical and structural features that determine NA recognition. Binding site predictions were validated against benchmark datasets, achieving AUROC scores in the range of 0.92-0.95. We applied our models to the HIV-1 restriction factor APOBEC3G and showed that our model predictions are consistent with and help explain experimental RNA binding data. Supplementary information The online version contains supplementary material available at 10.1007/s12551-024-01201-w.
Collapse
Affiliation(s)
- Jared M. Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
- Present Address: Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158 USA
| | - Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
| | - Jiawei Huang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
| | - Xiaojiang S. Chen
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089 USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089 USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089 USA
| |
Collapse
|
7
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
8
|
Sagendorf JM, Mitra R, Huang J, Chen XS, Rohs R. PNAbind: Structure-based prediction of protein-nucleic acid binding using graph neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582387. [PMID: 38529493 PMCID: PMC10962711 DOI: 10.1101/2024.02.27.582387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
The recognition and binding of nucleic acids (NAs) by proteins depends upon complementary chemical, electrostatic and geometric properties of the protein-NA binding interface. Structural models of protein-NA complexes provide insights into these properties but are scarce relative to models of unbound proteins. We present a deep learning approach for predicting protein-NA binding given the apo structure of a protein (PNAbind). Our method utilizes graph neural networks to encode spatial distributions of physicochemical and geometric properties of the protein molecular surface that are predictive of NA binding. Using global physicochemical encodings, our models predict the overall binding function of a protein and can discriminate between specificity for DNA or RNA binding. We show that such predictions made on protein structures modeled with AlphaFold2 can be used to gain mechanistic understanding of chemical and structural features that determine NA recognition. Using local encodings, our models predict the location of NA binding sites at the level of individual binding residues. Binding site predictions were validated against benchmark datasets, achieving AUROC scores in the range of 0.92-0.95. We applied our models to the HIV-1 restriction factor APOBEC3G and show that our predictions are consistent with experimental RNA binding data.
Collapse
|
9
|
Zhang J, Basu S, Kurgan L. HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins. Nucleic Acids Res 2024; 52:e10. [PMID: 38048333 PMCID: PMC10810184 DOI: 10.1093/nar/gkad1131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 11/10/2023] [Indexed: 12/06/2023] Open
Abstract
Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, PR China
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
10
|
Basu S, Zhao B, Biró B, Faraggi E, Gsponer J, Hu G, Kloczkowski A, Malhis N, Mirdita M, Söding J, Steinegger M, Wang D, Wang K, Xu D, Zhang J, Kurgan L. DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options. Nucleic Acids Res 2024; 52:D426-D433. [PMID: 37933852 PMCID: PMC10767971 DOI: 10.1093/nar/gkad985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/12/2023] [Accepted: 10/16/2023] [Indexed: 11/08/2023] Open
Abstract
The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Bálint Biró
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
- Department of Animal Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
| | - Eshel Faraggi
- Physics Department, Indiana University, Indianapolis, IN, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, P.R. China
| | - Andrzej Kloczkowski
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
- Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Duolin Wang
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, P.R. China
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, P.R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
11
|
Cong H, Liu H, Cao Y, Liang C, Chen Y. Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinformatics 2023; 24:456. [PMID: 38053020 DOI: 10.1186/s12859-023-05592-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 11/30/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. RESULTS We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. CONCLUSION The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|
12
|
Basu S, Hegedűs T, Kurgan L. CoMemMoRFPred: Sequence-based Prediction of MemMoRFs by Combining Predictors of Intrinsic Disorder, MoRFs and Disordered Lipid-binding Regions. J Mol Biol 2023; 435:168272. [PMID: 37709009 DOI: 10.1016/j.jmb.2023.168272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 09/01/2023] [Accepted: 09/07/2023] [Indexed: 09/16/2023]
Abstract
Molecular recognition features (MoRFs) are a commonly occurring type of intrinsically disordered regions (IDRs) that undergo disorder-to-order transition upon binding to partner molecules. We focus on recently characterized and functionally important membrane-binding MoRFs (MemMoRFs). Motivated by the lack of computational tools that predict MemMoRFs, we use a dataset of experimentally annotated MemMoRFs to conceptualize, design, evaluate and release an accurate sequence-based predictor. We rely on state-of-the-art tools that predict residues that possess key characteristics of MemMoRFs, such as intrinsic disorder, disorder-to-order transition and lipid-binding. We identify and combine results from three tools that include flDPnn for the disorder prediction, DisoLipPred for the prediction of disordered lipid-binding regions, and MoRFCHiBiLight for the prediction of disorder-to-order transitioning protein binding regions. Our empirical analysis demonstrates that combining results produced by these three methods generates accurate predictions of MemMoRFs. We also show that use of a smoothing operator produces predictions that closely mimic the number and sizes of the native MemMoRF regions. The resulting CoMemMoRFPred method is available as an easy-to-use webserver at http://biomine.cs.vcu.edu/servers/CoMemMoRFPred. This tool will aid future studies of MemMoRFs in the context of exploring their abundance, cellular functions, and roles in pathologic phenomena.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Tamás Hegedűs
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary; ELKH-SE Biophysical Virology Research Group, Eötvös Loránd Research Network, Budapest, Hungary
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA.
| |
Collapse
|
13
|
Li J, Cui Z, Fan C, Zhou Y, Ren M, Zhou C. Photo-caged 2-butene-1,4-dial as an efficient, target-specific photo-crosslinker for covalent trapping of DNA-binding proteins. Chem Sci 2023; 14:10884-10891. [PMID: 37829010 PMCID: PMC10566456 DOI: 10.1039/d3sc03719c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/12/2023] [Indexed: 10/14/2023] Open
Abstract
Covalent trapping of DNA-binding proteins via photo-crosslinking is an advantageous method for studying DNA-protein interactions. However, traditional photo-crosslinkers generate highly reactive intermediates that rapidly and non-selectively react with nearby functional groups, resulting in low target-capture yields and high non-target background capture. Herein, we report that photo-caged 2-butene-1,4-dial (PBDA) is an efficient photo-crosslinker for trapping DNA-binding proteins. Photo-irradiation (360 nm) of PBDA-modified DNA generates 2-butene-1,4-dial (BDA), a small, long-lived intermediate that reacts selectively with Lys residues of DNA-binding proteins, leading in minutes to stable DNA-protein crosslinks in up to 70% yield. In addition, BDA exhibits high specificity for target proteins, leading to low non-target background capture. The high photo-crosslinking yield and target specificity make PBDA a powerful tool for studying DNA-protein interactions.
Collapse
Affiliation(s)
- Jiahui Li
- State Key Laboratory of Elemento-Organic Chemistry, Frontiers Science Center for New Organic Matter, Department of Chemical Biology, College of Chemistry, Nankai University Tianjin 300071 China
| | - Zenghui Cui
- State Key Laboratory of Elemento-Organic Chemistry, Frontiers Science Center for New Organic Matter, Department of Chemical Biology, College of Chemistry, Nankai University Tianjin 300071 China
| | - Chaochao Fan
- State Key Laboratory of Elemento-Organic Chemistry, Frontiers Science Center for New Organic Matter, Department of Chemical Biology, College of Chemistry, Nankai University Tianjin 300071 China
| | - Yifei Zhou
- State Key Laboratory of Elemento-Organic Chemistry, Frontiers Science Center for New Organic Matter, Department of Chemical Biology, College of Chemistry, Nankai University Tianjin 300071 China
| | - Mengtian Ren
- State Key Laboratory of Elemento-Organic Chemistry, Frontiers Science Center for New Organic Matter, Department of Chemical Biology, College of Chemistry, Nankai University Tianjin 300071 China
| | - Chuanzheng Zhou
- State Key Laboratory of Elemento-Organic Chemistry, Frontiers Science Center for New Organic Matter, Department of Chemical Biology, College of Chemistry, Nankai University Tianjin 300071 China
| |
Collapse
|
14
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
15
|
Cordeiro Y, Freire MHO, Wiecikowski AF, do Amaral MJ. (Dys)functional insights into nucleic acids and RNA-binding proteins modulation of the prion protein and α-synuclein phase separation. Biophys Rev 2023; 15:577-589. [PMID: 37681103 PMCID: PMC10480379 DOI: 10.1007/s12551-023-01067-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/22/2023] [Indexed: 09/09/2023] Open
Abstract
Prion diseases are prototype of infectious diseases transmitted by a protein, the prion protein (PrP), and are still not understandable at the molecular level. Heterogenous species of aggregated PrP can be generated from its monomer. α-synuclein (αSyn), related to Parkinson's disease, has also shown a prion-like pathogenic character, and likewise PrP interacts with nucleic acids (NAs), which in turn modulate their aggregation. Recently, our group and others have characterized that NAs and/or RNA-binding proteins (RBPs) modulate recombinant PrP and/or αSyn condensates formation, and uncontrolled condensation might precede pathological aggregation. Tackling abnormal phase separation of neurodegenerative disease-related proteins has been proposed as a promising therapeutic target. Therefore, understanding the mechanism by which polyanions, like NAs, modulate phase transitions intracellularly, is key to assess their role on toxicity promotion and neuronal death. Herein we discuss data on the nucleic acids binding properties and phase separation ability of PrP and αSyn with a special focus on their modulation by NAs and RBPs. Furthermore, we provide insights into condensation of PrP and/or αSyn in the light of non-trivial subcellular locations such as the nuclear and cytosolic environments.
Collapse
Affiliation(s)
- Yraima Cordeiro
- Faculty of Pharmacy, Universidade Federal do Rio de Janeiro, Av Carlos Chagas Filho 373, bloco B, subsolo Sala 36, Rio de Janeiro, RJ 21941-902 Brazil
| | - Maria Heloisa O. Freire
- Faculty of Pharmacy, Universidade Federal do Rio de Janeiro, Av Carlos Chagas Filho 373, bloco B, subsolo Sala 36, Rio de Janeiro, RJ 21941-902 Brazil
| | - Adalgisa Felippe Wiecikowski
- Faculty of Pharmacy, Universidade Federal do Rio de Janeiro, Av Carlos Chagas Filho 373, bloco B, subsolo Sala 36, Rio de Janeiro, RJ 21941-902 Brazil
| | - Mariana Juliani do Amaral
- Faculty of Pharmacy, Universidade Federal do Rio de Janeiro, Av Carlos Chagas Filho 373, bloco B, subsolo Sala 36, Rio de Janeiro, RJ 21941-902 Brazil
| |
Collapse
|
16
|
Lyu H, Sun L, Guan Z, Li J, Yin C, Zhang Y, Jiang H. Proximity labeling reveals OTUD3 as a DNA-binding deubiquitinase of cGAS. Cell Rep 2023; 42:112309. [PMID: 36966392 DOI: 10.1016/j.celrep.2023.112309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 12/02/2022] [Accepted: 03/10/2023] [Indexed: 03/27/2023] Open
Abstract
Cyclic GMP-AMP synthase (cGAS), as the major DNA sensor, initiates DNA-stimulated innate immune responses and is essential for a healthy immune system. Although some regulators of cGAS have been reported, it still remains largely unclear how cGAS is precisely and dynamically regulated and how many potential regulators govern cGAS. Here we carry out proximity labeling of cGAS with TurboID in cells and identify a number of potential cGAS-interacting or -adjacent proteins. Deubiquitinase OTUD3, one candidate identified in cytosolic cGAS-DNA complex, is further validated to not only stabilize cGAS but also enhance cGAS enzymatic activity, which eventually promotes anti-DNA virus immune response. We show that OTUD3 can directly bind DNA and is recruited to the cytosolic DNA complex, increasing its association with cGAS. Our findings reveal OTUD3 as a versatile cGAS regulator and find one more layer of regulatory mechanism in DNA-stimulated innate immune responses.
Collapse
Affiliation(s)
- Heng Lyu
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Le Sun
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhenyu Guan
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jinxin Li
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China
| | - Changsong Yin
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China
| | - Yaoyang Zhang
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hong Jiang
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
17
|
Zhang F, Li M, Zhang J, Kurgan L. HybridRNAbind: prediction of RNA interacting residues across structure-annotated and disorder-annotated proteins. Nucleic Acids Res 2023; 51:e25. [PMID: 36629262 PMCID: PMC10018345 DOI: 10.1093/nar/gkac1253] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 11/22/2022] [Accepted: 12/15/2022] [Indexed: 01/12/2023] Open
Abstract
The sequence-based predictors of RNA-binding residues (RBRs) are trained on either structure-annotated or disorder-annotated binding regions. A recent study of predictors of protein-binding residues shows that they are plagued by high levels of cross-predictions (protein binding residues are predicted as nucleic acid binding) and that structure-trained predictors perform poorly for the disorder-annotated regions and vice versa. Consequently, we analyze a representative set of the structure and disorder trained predictors of RBRs to comprehensively assess quality of their predictions. Our empirical analysis that relies on a new and low-similarity benchmark dataset reveals that the structure-trained predictors of RBRs perform well for the structure-annotated proteins while the disorder-trained predictors provide accurate results for the disorder-annotated proteins. However, these methods work only modestly well on the opposite types of annotations, motivating the need for new solutions. Using an empirical approach, we design HybridRNAbind meta-model that generates accurate predictions and low amounts of cross-predictions when tested on data that combines structure and disorder-annotated RBRs. We release this meta-model as a convenient webserver which is available at https://www.csuligroup.com/hybridRNAbind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
18
|
Aybey E, Gümüş Ö. SENSDeep: An Ensemble Deep Learning Method for Protein-Protein Interaction Sites Prediction. Interdiscip Sci 2023; 15:55-87. [PMID: 36346583 DOI: 10.1007/s12539-022-00543-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/15/2022] [Accepted: 10/17/2022] [Indexed: 11/09/2022]
Abstract
PURPOSE The determination of which amino acid in a protein interacts with other proteins is important in understanding the functional mechanism of that protein. Although there are experimental methods to detect protein-protein interaction sites (PPISs), these are costly, time-consuming, and require expertise. Therefore, many computational methods have been proposed to accelerate this type of research, but they are generally insufficient to predict PPISs accurately. There is a need for development in this field. METHODS In this study, we introduce a new PPISs prediction method. This method is a sequence-based Stacking ENSemble Deep (SENSDeep) learning method that has an ensemble learning model including the models of RNN, CNN, GRU sequence to sequence (GRUs2s), GRU sequence to sequence with an attention layer (GRUs2satt) and a multilayer perceptron. Two embedded features, secondary structure, and protein sequence information are added to the training data set in addition to twelve existing features to improve the prediction performance of the method. RESULTS SENSDeep trained on the training data set without two extra features obtains a better performance on some of the independent testing data sets than that of the other methods in the literature, especially on scoring metrics of sensitivity, F1, MCC, and AUPRC, having increments up to 63.5%, 19.3%, 18.5%, 11.4%, respectively. It is shown that the added extra features improve the performance of the method by having almost the same performance with less data as the method trained on the data set without these added features. On the other hand, different sizes of the sliding window are tried on the data sets and an optimal sliding window size for SENSDeep is found. Moreover, SENSDeep has also been compared to structure-based methods. Some of these methods have been found to perform better. Using SENSDeep obtained by training with both training data sets, PPISs prediction examples of various proteins that are not in these training data sets are also presented. Furthermore, execution times for SENSDeep and its submodels are shown. AVAILABILITY AND IMPLEMENTATION https://github.com/enginaybey/SENSDeep.
Collapse
Affiliation(s)
- Engin Aybey
- Department of Health Bioinformatics, Ege University, 35100, Bornova, Izmir, Turkey.
- Rectorate, Marmara University, 34722, Kadıköy, Istanbul, Turkey.
| | - Özgür Gümüş
- Department of Computer Engineering, Ege University, 35100, Bornova, Izmir, Turkey
| |
Collapse
|
19
|
Patiyal S, Dhall A, Bajaj K, Sahu H, Raghava GPS. Prediction of RNA-interacting residues in a protein using CNN and evolutionary profile. Brief Bioinform 2023; 24:6901899. [PMID: 36516298 DOI: 10.1093/bib/bbac538] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 12/15/2022] Open
Abstract
This paper describes a method Pprint2, which is an improved version of Pprint developed for predicting RNA-interacting residues in a protein. Training and independent/validation datasets used in this study comprises of 545 and 161 non-redundant RNA-binding proteins, respectively. All models were trained on training dataset and evaluated on the validation dataset. The preliminary analysis reveals that positively charged amino acids such as H, R and K, are more prominent in the RNA-interacting residues. Initially, machine learning based models have been developed using binary profile and obtain maximum area under curve (AUC) 0.68 on validation dataset. The performance of this model improved significantly from AUC 0.68 to 0.76, when evolutionary profile is used instead of binary profile. The performance of our evolutionary profile-based model improved further from AUC 0.76 to 0.82, when convolutional neural network has been used for developing model. Our final model based on convolutional neural network using evolutionary information achieved AUC 0.82 with Matthews correlation coefficient of 0.49 on the validation dataset. Our best model outperforms existing methods when evaluated on the independent/validation dataset. A user-friendly standalone software and web-based server named 'Pprint2' has been developed for predicting RNA-interacting residues (https://webs.iiitd.edu.in/raghava/pprint2 and https://github.com/raghavagps/pprint2).
Collapse
Affiliation(s)
- Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Khushboo Bajaj
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Harshita Sahu
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
20
|
Zhang F, Li M, Zhang J, Shi W, Kurgan L. DeepPRObind: Modular Deep Learner that Accurately Predicts Structure and Disorder-Annotated Protein Binding Residues. J Mol Biol 2023:167945. [PMID: 36621533 DOI: 10.1016/j.jmb.2023.167945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 12/15/2022] [Accepted: 01/01/2023] [Indexed: 01/07/2023]
Abstract
Current sequence-based predictors of protein-binding residues (PBRs) belong to two distinct categories: structure-trained vs. intrinsic disorder-trained. Since disordered PBRs differ from structured PBRs in several ways, including ability to bind multiple partners by folding into different conformations and enrichment in different amino acids, the structure-trained and disorder-trained predictors were shown to provide inaccurate results for the other annotation type. A simple consensus-based solution that combines structure- and disorder-trained methods provides limited levels of predictive performance and generates relatively many cross-predictions, where residues that interact with other ligand types are predicted as PBRs. We address this unsolved problem by designing a novel and fast deep-learner, DeepPRObind, that relies on carefully designed modular convolutional architecture and uses innovative aggregate input features. Comparative empirical tests on a low-similarity test dataset reveal that DeepPRObind generates accurate predictions of structured and disordered PBRs and low amounts of cross-predictions, outperforming a comprehensive collection of 12 predictors of PBRs. Given the relatively low runtime of DeepPRObind (40 seconds per protein), we further validate its results based on an analysis of putative PBRs in the yeast proteome, confirming that interactions in disordered regions are enriched among hub proteins. We release DeepPRObind as a convenient web server at https://www.csuligroup.com/DeepPRObind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Wenbo Shi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA.
| |
Collapse
|
21
|
Wu Z, Basu S, Wu X, Kurgan L. qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids. Protein Sci 2023; 32:e4544. [PMID: 36519304 PMCID: PMC9798252 DOI: 10.1002/pro.4544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
Collapse
Affiliation(s)
- Zhonghua Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Sushmita Basu
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Xuantai Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
22
|
ISPRED-SEQ: Deep neural networks and embeddings for predicting interaction sites in protein sequences. J Mol Biol 2023. [DOI: 10.1016/j.jmb.2023.167963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
23
|
Zheng P, Qi Y, Li X, Liu Y, Yao Y, Huang G. A capsule network-based method for identifying transcription factors. Front Microbiol 2022; 13:1048478. [PMID: 36560938 PMCID: PMC9763301 DOI: 10.3389/fmicb.2022.1048478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 10/26/2022] [Indexed: 12/12/2022] Open
Abstract
Transcription factors (TFs) are typical regulators for gene expression and play versatile roles in cellular processes. Since it is time-consuming, costly, and labor-intensive to detect it by using physical methods, it is desired to develop a computational method to detect TFs. Here, we presented a capsule network-based method for identifying TFs. This method is an end-to-end deep learning method, consisting mainly of an embedding layer, bidirectional long short-term memory (LSTM) layer, capsule network layer, and three fully connected layers. The presented method obtained an accuracy of 0.8820, being superior to the state-of-the-art methods. These empirical experiments showed that the inclusion of the capsule network promoted great performances and that the capsule network-based representation was superior to the property-based representation for distinguishing between TFs and non-TFs. We also implemented the presented method into a user-friendly web server, which is freely available at http://www.biolscience.cn/Capsule_TF/ for all scientific researchers.
Collapse
Affiliation(s)
- Peijie Zheng
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Yue Qi
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Xueyong Li
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Yuewu Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha, China
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China,*Correspondence: Guohua Huang,
| |
Collapse
|
24
|
Li M, Wu Z, Wang W, Lu K, Zhang J, Zhou Y, Chen Z, Li D, Zheng S, Chen P, Wang B. Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3646-3654. [PMID: 34705656 DOI: 10.1109/tcbb.2021.3123269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The computational methods of protein-protein interaction sites prediction can effectively avoid the shortcomings of high cost and time in traditional experimental approaches. However, the serious class imbalance between interface and non-interface residues on the protein sequences limits the prediction performance of these methods. This work therefore proposed a new strategy, NearMiss-based under-sampling for unbalancing datasets and Random Forest classification (NM-RF), to predict protein interaction sites. Herein, the residues on protein sequences were represented by the PSSM-derived features, hydropathy index (HI) and relative solvent accessibility (RSA). In order to resolve the class imbalance problem, an under-sampling method based on NearMiss algorithm is adopted to remove some non-interface residues, and then the random forest algorithm is used to perform binary classification on the balanced feature datasets. Experiments show that the accuracy of NM-RF model reaches 87.6% and 84.3% on Dtestset72 and PDBtestset164 respectively, which demonstrate the effectiveness of the proposed NM-RF method in differentiating the interface or non-interface residues.
Collapse
|
25
|
PITHIA: Protein Interaction Site Prediction Using Multiple Sequence Alignments and Attention. Int J Mol Sci 2022; 23:ijms232112814. [PMID: 36361606 PMCID: PMC9657891 DOI: 10.3390/ijms232112814] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 11/22/2022] Open
Abstract
Cellular functions are governed by proteins, and, while some proteins work independently, most work by interacting with other proteins. As a result it is crucially important to know the interaction sites that facilitate the interactions between the proteins. Since the experimental methods are costly and time consuming, it is essential to develop effective computational methods. We present PITHIA, a sequence-based deep learning model for protein interaction site prediction that exploits the combination of multiple sequence alignments and learning attention. We demonstrate that our new model clearly outperforms the state-of-the-art models on a wide range of metrics. In order to provide meaningful comparison, we update existing test datasets with new information regarding interaction site, as well as introduce an additional new testing dataset which resolves the shortcomings of the existing ones.
Collapse
|
26
|
Patiyal S, Dhall A, Raghava GPS. A deep learning-based method for the prediction of DNA interacting residues in a protein. Brief Bioinform 2022; 23:6658239. [PMID: 35943134 DOI: 10.1093/bib/bbac322] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/01/2022] [Accepted: 07/15/2022] [Indexed: 11/13/2022] Open
Abstract
DNA-protein interaction is one of the most crucial interactions in the biological system, which decides the fate of many processes such as transcription, regulation and splicing of genes. In this study, we trained our models on a training dataset of 646 DNA-binding proteins having 15 636 DNA interacting and 298 503 non-interacting residues. Our trained models were evaluated on an independent dataset of 46 DNA-binding proteins having 965 DNA interacting and 9911 non-interacting residues. All proteins in the independent dataset have less than 30% of sequence similarity with proteins in the training dataset. A wide range of traditional machine learning and deep learning (1D-CNN) techniques-based models have been developed using binary, physicochemical properties and Position-Specific Scoring Matrix (PSSM)/evolutionary profiles. In the case of machine learning technique, eXtreme Gradient Boosting-based model achieved a maximum area under the receiver operating characteristics (AUROC) curve of 0.77 on the independent dataset using PSSM profile. Deep learning-based model achieved the highest AUROC of 0.79 on the independent dataset using a combination of all three profiles. We evaluated the performance of existing methods on the independent dataset and observed that our proposed method outperformed all the existing methods. In order to facilitate scientific community, we developed standalone software and web server, which are accessible from https://webs.iiitd.edu.in/raghava/dbpred.
Collapse
Affiliation(s)
- Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
27
|
Ishiguro A, Ishihama A. Essential Roles and Risks of G-Quadruplex Regulation: Recognition Targets of ALS-Linked TDP-43 and FUS. Front Mol Biosci 2022; 9:957502. [PMID: 35898304 PMCID: PMC9309350 DOI: 10.3389/fmolb.2022.957502] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 06/21/2022] [Indexed: 11/26/2022] Open
Abstract
A non-canonical DNA/RNA structure, G-quadruplex (G4), is a unique structure formed by two or more guanine quartets, which associate through Hoogsteen hydrogen bonding leading to form a square planar arrangement. A set of RNA-binding proteins specifically recognize G4 structures and play certain unique physiological roles. These G4-binding proteins form ribonucleoprotein (RNP) through a physicochemical phenomenon called liquid-liquid phase separation (LLPS). G4-containing RNP granules are identified in both prokaryotes and eukaryotes, but extensive studies have been performed in eukaryotes. We have been involved in analyses of the roles of G4-containing RNAs recognized by two G4-RNA-binding proteins, TDP-43 and FUS, which both are the amyotrophic lateral sclerosis (ALS) causative gene products. These RNA-binding proteins play the essential roles in both G4 recognition and LLPS, but they also carry the risk of agglutination. The biological significance of G4-binding proteins is controlled through unique 3D structure of G4, of which the risk of conformational stability is influenced by environmental conditions such as monovalent metals and guanine oxidation.
Collapse
|
28
|
Nie W, Deng L. TSNAPred: predicting type-specific nucleic acid binding residues via an ensemble approach. Brief Bioinform 2022; 23:6618235. [PMID: 35753699 DOI: 10.1093/bib/bbac244] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/22/2022] [Accepted: 05/24/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The interplay between protein and nucleic acid participates in diverse biological activities. Accurately identifying the interaction between protein and nucleic acid can strengthen the understanding of protein function. However, conventional methods are too time-consuming, and computational methods are type-agnostic predictions. We proposed an ensemble predictor termed TSNAPred and first used it to identify residues that bind to A-DNA, B-DNA, ssDNA, mRNA, tRNA and rRNA. TSNAPred combines LightGBM and capsule network, both learned on the feature derived from protein sequence. TSNAPred utilizes the sliding window technique to extract long-distance dependencies between residues and a weighted ensemble strategy to enhance the prediction performance. The results show that TSNAPred can effectively identify type-specific nucleic acid binding residues in our test set. What is more, it also can discriminate DNA-binding and RNA-binding residues, which has improved 5% to 10% on the AUC value compared with other state-of-the-art methods. The dataset and code of TSNAPred are available at: https://github.com/niewenjuan-csu/TSNAPred.
Collapse
Affiliation(s)
- Wenjuan Nie
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| |
Collapse
|
29
|
Pengelly RJ, Bakhtiar D, Borovská I, Královičová J, Vořechovský I. Exonic splicing code and protein binding sites for calcium. Nucleic Acids Res 2022; 50:5493-5512. [PMID: 35474482 PMCID: PMC9177970 DOI: 10.1093/nar/gkac270] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 04/01/2022] [Accepted: 04/05/2022] [Indexed: 11/12/2022] Open
Abstract
Auxilliary splicing sequences in exons, known as enhancers (ESEs) and silencers (ESSs), have been subject to strong selection pressures at the RNA and protein level. The protein component of this splicing code is substantial, recently estimated at ∼50% of the total information within ESEs, but remains poorly understood. The ESE/ESS profiles were previously associated with the Irving-Williams (I-W) stability series for divalent metals, suggesting that the ESE/ESS evolution was shaped by metal binding sites. Here, we have examined splicing activities of exonic sequences that encode protein binding sites for Ca2+, a weak binder in the I-W affinity order. We found that predicted exon inclusion levels for the EF-hand motifs and for Ca2+-binding residues in nonEF-hand proteins were higher than for average exons. For canonical EF-hands, the increase was centred on the EF-hand chelation loop and, in particular, on Ca2+-coordinating residues, with a 1>12>3∼5>9 hierarchy in the 12-codon loop consensus and usage bias at codons 1 and 12. The same hierarchy but a lower increase was observed for noncanonical EF-hands, except for S100 proteins. EF-hand loops preferentially accumulated exon splits in two clusters, one located in their N-terminal halves and the other around codon 12. Using splicing assays and published crosslinking and immunoprecipitation data, we identify candidate trans-acting factors that preferentially bind conserved GA-rich motifs encoding negatively charged amino acids in the loops. Together, these data provide evidence for the high capacity of codons for Ca2+-coordinating residues to be retained in mature transcripts, facilitating their exon-level expansion during eukaryotic evolution.
Collapse
Affiliation(s)
- Reuben J Pengelly
- University of Southampton, Faculty of Medicine, Southampton SO16 6YD, UK
| | - Dara Bakhtiar
- University of Southampton, Faculty of Medicine, Southampton SO16 6YD, UK
| | - Ivana Borovská
- Slovak Academy of Sciences, Centre of Biosciences, 840 05 Bratislava, Slovak Republic
| | - Jana Královičová
- University of Southampton, Faculty of Medicine, Southampton SO16 6YD, UK
- Slovak Academy of Sciences, Centre of Biosciences, 840 05 Bratislava, Slovak Republic
- Slovak Academy of Sciences, Institute of Zoology, 845 06 Bratislava, Slovak Republic
| | - Igor Vořechovský
- University of Southampton, Faculty of Medicine, Southampton SO16 6YD, UK
| |
Collapse
|
30
|
Arya A, Mary Varghese D, Kumar Verma A, Ahmad S. Inadequacy of evolutionary profiles vis-a-vis single sequences in predicting transient DNA-binding sites in proteins. J Mol Biol 2022; 434:167640. [PMID: 35597551 DOI: 10.1016/j.jmb.2022.167640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/01/2022] [Accepted: 05/16/2022] [Indexed: 10/18/2022]
Abstract
Sequence-based prediction of DNA-binding residues in a protein is a widely studied problem for which machine learning methods with continuously improving predictive power have been developed. Concatenated rows within a sliding window of a Position Specific Substitution Matrix (PSSM) of the protein concerned are currently used as the primary feature set in almost all the methods of predicting DNA-binding residues. Here we report that these evolutionary profiles are powerful, only for identifying conserved binding sites and fall short for the residue positions which undergo binding to non-binding transitions in closely related proteins. We created a database of highly similar protein pairs with known protein-DNA complexes and investigated differential predictability of conserved and transient binding within each pair. Retraining machine learning models uniformly, we compared the predictive powers of the models trained on PSSMs against similarly trained models on sparse-encoded single sequences. We found that the transient binding site predictions from evolutionary profiles are outperformed by single sequence based models under controlled training and test experiments by as much as 8 percentage points. Thus, we conclude that the PSSM-based models are inadequate to predict high specificity DNA-binding residues. These findings are of critical significance for the design of mutant- and species-specific DNA ligands and for homology based modeling of protein-DNA complexes.
Collapse
Affiliation(s)
- Ajay Arya
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, INDIA
| | - Dana Mary Varghese
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, INDIA
| | - Ajay Kumar Verma
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, INDIA
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, INDIA.
| |
Collapse
|
31
|
Biró B, Zhao B, Kurgan L. Complementarity of the residue-level protein function and structure predictions in human proteins. Comput Struct Biotechnol J 2022; 20:2223-2234. [PMID: 35615015 PMCID: PMC9118482 DOI: 10.1016/j.csbj.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/02/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022] Open
Abstract
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
Collapse
Affiliation(s)
- Bálint Biró
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
32
|
A Comprehensive Review of Computation-Based Metal-Binding Prediction Approaches at the Residue Level. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8965712. [PMID: 35402609 PMCID: PMC8989566 DOI: 10.1155/2022/8965712] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 03/04/2022] [Indexed: 12/29/2022]
Abstract
Clear evidence has shown that metal ions strongly connect and delicately tune the dynamic homeostasis in living bodies. They have been proved to be associated with protein structure, stability, regulation, and function. Even small changes in the concentration of metal ions can shift their effects from natural beneficial functions to harmful. This leads to degenerative diseases, malignant tumors, and cancers. Accurate characterizations and predictions of metalloproteins at the residue level promise informative clues to the investigation of intrinsic mechanisms of protein-metal ion interactions. Compared to biophysical or biochemical wet-lab technologies, computational methods provide open web interfaces of high-resolution databases and high-throughput predictors for efficient investigation of metal-binding residues. This review surveys and details 18 public databases of metal-protein binding. We collect a comprehensive set of 44 computation-based methods and classify them into four categories, namely, learning-, docking-, template-, and meta-based methods. We analyze the benchmark datasets, assessment criteria, feature construction, and algorithms. We also compare several methods on two benchmark testing datasets and include a discussion about currently publicly available predictive tools. Finally, we summarize the challenges and underlying limitations of the current studies and propose several prospective directions concerning the future development of the related databases and methods.
Collapse
|
33
|
Ilina A, Khavinson V, Linkova N, Petukhov M. Neuroepigenetic Mechanisms of Action of Ultrashort Peptides in Alzheimer's Disease. Int J Mol Sci 2022; 23:ijms23084259. [PMID: 35457077 PMCID: PMC9032300 DOI: 10.3390/ijms23084259] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 04/07/2022] [Accepted: 04/09/2022] [Indexed: 12/23/2022] Open
Abstract
Epigenetic regulation of gene expression is necessary for maintaining higher-order cognitive functions (learning and memory). The current understanding of the role of epigenetics in the mechanism of Alzheimer’s disease (AD) is focused on DNA methylation, chromatin remodeling, histone modifications, and regulation of non-coding RNAs. The pathogenetic links of this disease are the misfolding and aggregation of tau protein and amyloid peptides, mitochondrial dysfunction, oxidative stress, impaired energy metabolism, destruction of the blood–brain barrier, and neuroinflammation, all of which lead to impaired synaptic plasticity and memory loss. Ultrashort peptides are promising neuroprotective compounds with a broad spectrum of activity and without reported side effects. The main aim of this review is to analyze the possible epigenetic mechanisms of the neuroprotective action of ultrashort peptides in AD. The review highlights the role of short peptides in the AD pathophysiology. We formulate the hypothesis that peptide regulation of gene expression can be mediated by the interaction of short peptides with histone proteins, cis- and transregulatory DNA elements and effector molecules (DNA/RNA-binding proteins and non-coding RNA). The development of therapeutic agents based on ultrashort peptides may offer a promising addition to the multifunctional treatment of AD.
Collapse
Affiliation(s)
- Anastasiia Ilina
- Department of Biogerontology, Saint Petersburg Institute of Bioregulation and Gerontology, 19711 Saint Petersburg, Russia; (V.K.); (N.L.)
- Department of General Pathology and Pathological Physiology, Institute of Experimental Medicine, 197376 Saint Petersburg, Russia
- Correspondence: ; Tel.: +7-(953)145-89-58
| | - Vladimir Khavinson
- Department of Biogerontology, Saint Petersburg Institute of Bioregulation and Gerontology, 19711 Saint Petersburg, Russia; (V.K.); (N.L.)
- Group of Peptide Regulation of Aging, Pavlov Institute of Physiology, Russian Academy of Sciences, 199034 Saint Petersburg, Russia
| | - Natalia Linkova
- Department of Biogerontology, Saint Petersburg Institute of Bioregulation and Gerontology, 19711 Saint Petersburg, Russia; (V.K.); (N.L.)
| | - Mikhael Petukhov
- Department of Molecular Radiation Biophysics, Petersburg Nuclear Physics Institute Named after B.P. Konstantinov, NRC “Kurchatov Institute”, 188300 Gatchina, Russia;
- Group of Biophysics, Higher Engineering and Technical School, Peter the Great St. Petersburg Polytechnic University, 195251 Saint Petersburg, Russia
| |
Collapse
|
34
|
Structure-dependent of 3-fluorooxindole derivatives interacting with ctDNA: Binding effects and molecular docking approaches. Bioorg Chem 2022; 121:105698. [DOI: 10.1016/j.bioorg.2022.105698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/14/2021] [Accepted: 02/18/2022] [Indexed: 11/23/2022]
|
35
|
Arora V, Sanguinetti G. Challenges for machine learning in RNA-protein interaction prediction. Stat Appl Genet Mol Biol 2022; 21:sagmb-2021-0087. [PMID: 35073469 DOI: 10.1515/sagmb-2021-0087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/02/2022] [Indexed: 11/15/2022]
Abstract
RNA-protein interactions have long being recognised as crucial regulators of gene expression. Recently, the development of scalable experimental techniques to measure these interactions has revolutionised the field, leading to the production of large-scale datasets which offer both opportunities and challenges for machine learning techniques. In this brief note, we will discuss some of the major stumbling blocks towards the use of machine learning in computational RNA biology, focusing specifically on the problem of predicting RNA-protein interactions from next-generation sequencing data.
Collapse
Affiliation(s)
- Viplove Arora
- Data Science, Department of Physics, International School for Advanced Studies (SISSA), Trieste 34136, Italy
| | - Guido Sanguinetti
- Data Science, Department of Physics, International School for Advanced Studies (SISSA), Trieste 34136, Italy
| |
Collapse
|
36
|
Zhang F, Zhao B, Shi W, Li M, Kurgan L. DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning. Brief Bioinform 2021; 23:6461158. [PMID: 34905768 DOI: 10.1093/bib/bbab521] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 10/30/2021] [Accepted: 11/14/2021] [Indexed: 12/14/2022] Open
Abstract
Proteins with intrinsically disordered regions (IDRs) are common among eukaryotes. Many IDRs interact with nucleic acids and proteins. Annotation of these interactions is supported by computational predictors, but to date, only one tool that predicts interactions with nucleic acids was released, and recent assessments demonstrate that current predictors offer modest levels of accuracy. We have developed DeepDISOBind, an innovative deep multi-task architecture that accurately predicts deoxyribonucleic acid (DNA)-, ribonucleic acid (RNA)- and protein-binding IDRs from protein sequences. DeepDISOBind relies on an information-rich sequence profile that is processed by an innovative multi-task deep neural network, where subsequent layers are gradually specialized to predict interactions with specific partner types. The common input layer links to a layer that differentiates protein- and nucleic acid-binding, which further links to layers that discriminate between DNA and RNA interactions. Empirical tests show that this multi-task design provides statistically significant gains in predictive quality across the three partner types when compared to a single-task design and a representative selection of the existing methods that cover both disorder- and structure-trained tools. Analysis of the predictions on the human proteome reveals that DeepDISOBind predictions can be encoded into protein-level propensities that accurately predict DNA- and RNA-binding proteins and protein hubs. DeepDISOBind is available at https://www.csuligroup.com/DeepDISOBind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Wenbo Shi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
37
|
Zhang J, Ghadermarzi S, Katuwawala A, Kurgan L. DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences. Brief Bioinform 2021; 22:6355416. [PMID: 34415020 DOI: 10.1093/bib/bbab336] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/02/2021] [Accepted: 07/28/2021] [Indexed: 01/02/2023] Open
Abstract
Efforts to elucidate protein-DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie's outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie's webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology at the Xinyang Normal University, No.237, Nanhu Road, Xinyang 464000, Henan Province, P.R. China
| | - Sina Ghadermarzi
- Department of Computer Science at the Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, Virginia 23284, USA
| | - Akila Katuwawala
- Department of Computer Science from the Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, Virginia 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science at the Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, Virginia 23284, USA
| |
Collapse
|
38
|
Jiang Z, Xiao SR, Liu R. Dissecting and predicting different types of binding sites in nucleic acids based on structural information. Brief Bioinform 2021; 23:6384399. [PMID: 34624074 PMCID: PMC8769709 DOI: 10.1093/bib/bbab411] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/26/2021] [Accepted: 09/07/2021] [Indexed: 12/16/2022] Open
Abstract
The biological functions of DNA and RNA generally depend on their interactions with other molecules, such as small ligands, proteins and nucleic acids. However, our knowledge of the nucleic acid binding sites for different interaction partners is very limited, and identification of these critical binding regions is not a trivial work. Herein, we performed a comprehensive comparison between binding and nonbinding sites and among different categories of binding sites in these two nucleic acid classes. From the structural perspective, RNA may interact with ligands through forming binding pockets and contact proteins and nucleic acids using protruding surfaces, while DNA may adopt regions closer to the middle of the chain to make contacts with other molecules. Based on structural information, we established a feature-based ensemble learning classifier to identify the binding sites by fully using the interplay among different machine learning algorithms, feature spaces and sample spaces. Meanwhile, we designed a template-based classifier by exploiting structural conservation. The complementarity between the two classifiers motivated us to build an integrative framework for improving prediction performance. Moreover, we utilized a post-processing procedure based on the random walk algorithm to further correct the integrative predictions. Our unified prediction framework yielded promising results for different binding sites and outperformed existing methods.
Collapse
Affiliation(s)
- Zheng Jiang
- College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Si-Rui Xiao
- College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Rong Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
39
|
Jibiki K, Kodama TS, Suenaga A, Kawase Y, Shibazaki N, Nomoto S, Nagasawa S, Nagashima M, Shimodan S, Kikuchi R, Okayasu M, Takashita R, Mehmood R, Saitoh N, Yoneda Y, Akagi KI, Yasuhara N. Importin α2 association with chromatin: Direct DNA binding via a novel DNA-binding domain. Genes Cells 2021; 26:945-966. [PMID: 34519142 DOI: 10.1111/gtc.12896] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 09/10/2021] [Accepted: 09/11/2021] [Indexed: 12/18/2022]
Abstract
The nuclear transport of proteins is important for facilitating appropriate nuclear functions. The importin α family proteins play key roles in nuclear transport as transport receptors for copious nuclear proteins. Additionally, these proteins possess other functions, including chromatin association and gene regulation. However, these nontransport functions of importin α are not yet fully understood, especially their molecular-level mechanisms and consequences for functioning with chromatin. Here, we report the novel molecular characteristics of importin α binding to diverse DNA sequences in chromatin. We newly identified and characterized a DNA-binding domain-the Nucleic Acid Associating Trolley pole domain (NAAT domain)-in the N-terminal region of importin α within the conventional importin β binding (IBB) domain that is necessary for nuclear transport of cargo proteins. Furthermore, we found that the DNA binding of importin α synergistically coupled the recruitment of its cargo protein to DNA. This is the first study to delineate the interaction between importin α and chromatin DNA via the NAAT domain, indicating the bifunctionality of the importin α N-terminal region for nuclear transport and chromatin association.
Collapse
Affiliation(s)
- Kazuya Jibiki
- Graduate School of Integrated Basic Sciences, Nihon University, Tokyo, Japan
| | - Takashi S Kodama
- National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN), Osaka, Japan
| | - Atsushi Suenaga
- Graduate School of Integrated Basic Sciences, Nihon University, Tokyo, Japan.,Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| | - Yota Kawase
- Graduate School of Integrated Basic Sciences, Nihon University, Tokyo, Japan
| | - Noriko Shibazaki
- Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| | - Shin Nomoto
- Graduate School of Integrated Basic Sciences, Nihon University, Tokyo, Japan
| | - Seiya Nagasawa
- Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| | - Misaki Nagashima
- Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| | - Shieri Shimodan
- Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| | - Renan Kikuchi
- Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| | - Mina Okayasu
- Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| | - Ruka Takashita
- Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| | - Rashid Mehmood
- Department of Life Sciences, College of Science and General Studies, Alfaisal University, Riyadh, Saudi Arabia
| | - Noriko Saitoh
- Division of Cancer Biology, The Cancer Institute of JFCR, Tokyo, Japan
| | - Yoshihiro Yoneda
- National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN), Osaka, Japan
| | - Ken-Ichi Akagi
- National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN), Osaka, Japan.,Environmental Metabolic Analysis Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Noriko Yasuhara
- Graduate School of Integrated Basic Sciences, Nihon University, Tokyo, Japan.,Department of Biosciences, College of Humanities and Sciences, Nihon University, Tokyo, Japan
| |
Collapse
|
40
|
Etzion-Fuchs A, Todd DA, Singh M. dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains. Nucleic Acids Res 2021; 49:e78. [PMID: 33999210 PMCID: PMC8287948 DOI: 10.1093/nar/gkab356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/30/2021] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open
Abstract
Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.
Collapse
Affiliation(s)
- Anat Etzion-Fuchs
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA
| | - David A Todd
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA.,Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| |
Collapse
|
41
|
Zhang J, Chen Q, Liu B. DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1451-1463. [PMID: 31722485 DOI: 10.1109/tcbb.2019.2952338] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two kinds of crucial proteins, which are associated with various cellule activities and some important diseases. Accurate identification of DBPs and RBPs facilitate both theoretical research and real world application. Existing sequence-based DBP predictors can accurately identify DBPs but incorrectly predict many RBPs as DBPs, and vice versa, resulting in low prediction precision. Moreover, some proteins (DRBPs) interacting with both DNA and RNA play important roles in gene expression and cannot be identified by existing computational methods. In this study, a two-level predictor named DeepDRBP-2L was proposed by combining Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM). It is the first computational method that is able to identify DBPs, RBPs and DRBPs. Rigorous cross-validations and independent tests showed that DeepDRBP-2L is able to overcome the shortcoming of the existing methods and can go one further step to identify DRBPs. Application of DeepDRBP-2L to tomato genome further demonstrated its performance. The webserver of DeepDRBP-2L is freely available at http://bliulab.net/DeepDRBP-2L.
Collapse
|
42
|
Dettori LG, Torrejon D, Chakraborty A, Dutta A, Mohamed M, Papp C, Kuznetsov VA, Sung P, Feng W, Bah A. A Tale of Loops and Tails: The Role of Intrinsically Disordered Protein Regions in R-Loop Recognition and Phase Separation. Front Mol Biosci 2021; 8:691694. [PMID: 34179096 PMCID: PMC8222781 DOI: 10.3389/fmolb.2021.691694] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 05/14/2021] [Indexed: 11/13/2022] Open
Abstract
R-loops are non-canonical, three-stranded nucleic acid structures composed of a DNA:RNA hybrid, a displaced single-stranded (ss)DNA, and a trailing ssRNA overhang. R-loops perform critical biological functions under both normal and disease conditions. To elucidate their cellular functions, we need to understand the mechanisms underlying R-loop formation, recognition, signaling, and resolution. Previous high-throughput screens identified multiple proteins that bind R-loops, with many of these proteins containing folded nucleic acid processing and binding domains that prevent (e.g., topoisomerases), resolve (e.g., helicases, nucleases), or recognize (e.g., KH, RRMs) R-loops. However, a significant number of these R-loop interacting Enzyme and Reader proteins also contain long stretches of intrinsically disordered regions (IDRs). The precise molecular and structural mechanisms by which the folded domains and IDRs synergize to recognize and process R-loops or modulate R-loop-mediated signaling have not been fully explored. While studying one such modular R-loop Reader, the Fragile X Protein (FMRP), we unexpectedly discovered that the C-terminal IDR (C-IDR) of FMRP is the predominant R-loop binding site, with the three N-terminal KH domains recognizing the trailing ssRNA overhang. Interestingly, the C-IDR of FMRP has recently been shown to undergo spontaneous Liquid-Liquid Phase Separation (LLPS) assembly by itself or in complex with another non-canonical nucleic acid structure, RNA G-quadruplex. Furthermore, we have recently shown that FMRP can suppress persistent R-loops that form during transcription, a process that is also enhanced by LLPS via the assembly of membraneless transcription factories. These exciting findings prompted us to explore the role of IDRs in R-loop processing and signaling proteins through a comprehensive bioinformatics and computational biology study. Here, we evaluated IDR prevalence, sequence composition and LLPS propensity for the known R-loop interactome. We observed that, like FMRP, the majority of the R-loop interactome, especially Readers, contains long IDRs that are highly enriched in low complexity sequences with biased amino acid composition, suggesting that these IDRs could directly interact with R-loops, rather than being “mere flexible linkers” connecting the “functional folded enzyme or binding domains”. Furthermore, our analysis shows that several proteins in the R-loop interactome are either predicted to or have been experimentally demonstrated to undergo LLPS or are known to be associated with phase separated membraneless organelles. Thus, our overall results present a thought-provoking hypothesis that IDRs in the R-loop interactome can provide a functional link between R-loop recognition via direct binding and downstream signaling through the assembly of LLPS-mediated membrane-less R-loop foci. The absence or dysregulation of the function of IDR-enriched R-loop interactors can potentially lead to severe genomic defects, such as the widespread R-loop-mediated DNA double strand breaks that we recently observed in Fragile X patient-derived cells.
Collapse
Affiliation(s)
- Leonardo G Dettori
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Diego Torrejon
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Arijita Chakraborty
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Arijit Dutta
- Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, TX, United States
| | - Mohamed Mohamed
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Csaba Papp
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States.,Department of Urology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Vladimir A Kuznetsov
- Department of Urology, SUNY Upstate Medical University, Syracuse, NY, United States.,Bioinformatics Institute, ASTAR Biomedical Institutes, Singapore, Singapore
| | - Patrick Sung
- Department of Biochemistry and Structural Biology, University of Texas Health San Antonio, San Antonio, TX, United States
| | - Wenyi Feng
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| | - Alaji Bah
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, Syracuse, NY, United States
| |
Collapse
|
43
|
Li Y, Golding GB, Ilie L. DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics 2021; 37:896-904. [PMID: 32840562 DOI: 10.1093/bioinformatics/btaa750] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/14/2020] [Accepted: 08/19/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein-protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods. RESULTS We propose DEep Learning Prediction of Highly probable protein Interaction sites (DELPHI), a new sequence-based deep learning suite for PPI-binding sites prediction. DELPHI has an ensemble structure which combines a CNN and a RNN component with fine tuning technique. Three novel features, HSP, position information and ProtVec are used in addition to nine existing ones. We comprehensively compare DELPHI to nine state-of-the-art programmes on five datasets, and DELPHI outperforms the competing methods in all metrics even though its training dataset shares the least similarities with the testing datasets. In the most important metrics, AUPRC and MCC, it surpasses the second best programmes by as much as 18.5% and 27.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model and, especially, the three new features. Using DELPHI it is shown that there is a strong correlation with protein-binding residues (PBRs) and sites with strong evolutionary conservation. In addition, DELPHI's predicted PBR sites closely match known data from Pfam. DELPHI is available as open-sourced standalone software and web server. AVAILABILITY AND IMPLEMENTATION The DELPHI web server can be found at delphi.csd.uwo.ca/, with all datasets and results in this study. The trained models, the DELPHI standalone source code, and the feature computation pipeline are freely available at github.com/lucian-ilie/DELPHI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiwei Li
- Department of Computer Science, The University of Western Ontario London, ON N6A 5B7, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Lucian Ilie
- Department of Computer Science, The University of Western Ontario London, ON N6A 5B7, Canada
| |
Collapse
|
44
|
Roos D, de Boer M. Mutations in cis that affect mRNA synthesis, processing and translation. Biochim Biophys Acta Mol Basis Dis 2021; 1867:166166. [PMID: 33971252 DOI: 10.1016/j.bbadis.2021.166166] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 05/03/2021] [Accepted: 05/04/2021] [Indexed: 12/17/2022]
Abstract
Genetic mutations that cause hereditary diseases usually affect the composition of the transcribed mRNA and its encoded protein, leading to instability of the mRNA and/or the protein. Sometimes, however, such mutations affect the synthesis, the processing or the translation of the mRNA, with similar disastrous effects. We here present an overview of mRNA synthesis, its posttranscriptional modification and its translation into protein. We then indicate which elements in these processes are known to be affected by pathogenic mutations, but we restrict our review to mutations in cis, in the DNA of the gene that encodes the affected protein. These mutations can be in enhancer or promoter regions of the gene, which act as binding sites for transcription factors involved in pre-mRNA synthesis. We also describe mutations in polyadenylation sequences and in splice site regions, exonic and intronic, involved in intron removal. Finally, we include mutations in the Kozak sequence in mRNA, which is involved in protein synthesis. We provide examples of genetic diseases caused by mutations in these DNA regions and refer to databases to help identify these regions. The over-all knowledge of mRNA synthesis, processing and translation is essential for improvement of the diagnosis of patients with genetic diseases.
Collapse
Affiliation(s)
- Dirk Roos
- Sanquin Blood Supply Organization, Dept. of Blood Cell Research, Landsteiner Laboratory, Amsterdam University Medical Centre, location AMC, University of Amsterdam, Amsterdam, the Netherlands.
| | - Martin de Boer
- Sanquin Blood Supply Organization, Dept. of Blood Cell Research, Landsteiner Laboratory, Amsterdam University Medical Centre, location AMC, University of Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
45
|
Jiang Y, Liu HF, Liu R. Systematic comparison and prediction of the effects of missense mutations on protein-DNA and protein-RNA interactions. PLoS Comput Biol 2021; 17:e1008951. [PMID: 33872313 PMCID: PMC8084330 DOI: 10.1371/journal.pcbi.1008951] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 04/29/2021] [Accepted: 04/08/2021] [Indexed: 12/30/2022] Open
Abstract
The binding affinities of protein-nucleic acid interactions could be altered due to missense mutations occurring in DNA- or RNA-binding proteins, therefore resulting in various diseases. Unfortunately, a systematic comparison and prediction of the effects of mutations on protein-DNA and protein-RNA interactions (these two mutation classes are termed MPDs and MPRs, respectively) is still lacking. Here, we demonstrated that these two classes of mutations could generate similar or different tendencies for binding free energy changes in terms of the properties of mutated residues. We then developed regression algorithms separately for MPDs and MPRs by introducing novel geometric partition-based energy features and interface-based structural features. Through feature selection and ensemble learning, similar computational frameworks that integrated energy- and nonenergy-based models were established to estimate the binding affinity changes resulting from MPDs and MPRs, but the selected features for the final models were different and therefore reflected the specificity of these two mutation classes. Furthermore, the proposed methodology was extended to the identification of mutations that significantly decreased the binding affinities. Extensive validations indicated that our algorithm generally performed better than the state-of-the-art methods on both the regression and classification tasks. The webserver and software are freely available at http://liulab.hzau.edu.cn/PEMPNI and https://github.com/hzau-liulab/PEMPNI. Protein-nucleic acid interactions play important roles in various cellular processes. Missense mutations occurring in DNA- or RNA-binding proteins (termed MPDs and MPRs, respectively) could change the binding affinities of these interactions. Previous studies have compared protein-DNA and protein-RNA interactions from multifaceted viewpoints, but less attention has been given to the similarities and specific differences between the effects of MPDs and MPRs and between the methodologies for predicting the affinity changes induced by the two mutation classes. Therefore, we systematically compared their impacts and demonstrated that MPDs and MPRs could have specific preferences for binding affinity changes. These observations motivated us to construct regression models separately for MPDs and MPRs by introducing novel energy and nonenergy descriptors. Although similar frameworks were developed to estimate these two categories of mutation effects, different descriptors were selected in the regression models and further revealed the specificity of mutation classes. The interplay between the energy and nonenergy modules effectively improved prediction performance. Our algorithm can also be adopted to disentangle mutations significantly decreasing binding affinities from other mutations.
Collapse
Affiliation(s)
- Yao Jiang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Hui-Fang Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
46
|
Yang S, Liu X, Ng RT. ProbeRating: a recommender system to infer binding profiles for nucleic acid-binding proteins. Bioinformatics 2021; 36:4797-4804. [PMID: 32573679 PMCID: PMC7750938 DOI: 10.1093/bioinformatics/btaa580] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 05/18/2020] [Accepted: 06/18/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The interaction between proteins and nucleic acids plays a crucial role in gene regulation and cell function. Determining the binding preferences of nucleic acid-binding proteins (NBPs), namely RNA-binding proteins (RBPs) and transcription factors (TFs), is the key to decipher the protein-nucleic acids interaction code. Today, available NBP binding data from in vivo or in vitro experiments are still limited, which leaves a large portion of NBPs uncovered. Unfortunately, existing computational methods that model the NBP binding preferences are mostly protein specific: they need the experimental data for a specific protein in interest, and thus only focus on experimentally characterized NBPs. The binding preferences of experimentally unexplored NBPs remain largely unknown. RESULTS Here, we introduce ProbeRating, a nucleic acid recommender system that utilizes techniques from deep learning and word embeddings of natural language processing. ProbeRating is developed to predict binding profiles for unexplored or poorly studied NBPs by exploiting their homologs NBPs which currently have available binding data. Requiring only sequence information as input, ProbeRating adapts FastText from Facebook AI Research to extract biological features. It then builds a neural network-based recommender system. We evaluate the performance of ProbeRating on two different tasks: one for RBP and one for TF. As a result, ProbeRating outperforms previous methods on both tasks. The results show that ProbeRating can be a useful tool to study the binding mechanism for the many NBPs that lack direct experimental evidence. and implementation. AVAILABILITY AND IMPLEMENTATION The source code is freely available at <https://github.com/syang11/ProbeRating>. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shu Yang
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| | - Xiaoxi Liu
- RIKEN Center for Integrative Medical Sciences (IMS), Yokohama 230-0045, Japan
| | - Raymond T Ng
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| |
Collapse
|
47
|
Zhang J, Ghadermarzi S, Kurgan L. Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins. Bioinformatics 2021; 36:4729-4738. [PMID: 32860044 DOI: 10.1093/bioinformatics/btaa573] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 05/22/2020] [Accepted: 06/10/2020] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION There are over 30 sequence-based predictors of the protein-binding residues (PBRs). They use either structure-annotated or disorder-annotated training datasets, potentially creating a dichotomy where the structure-/disorder-specific models may not be able to cross-over to accurately predict the other type. Moreover, the structure-trained predictors were shown to substantially cross-predict PBRs among residues that interact with non-protein partners (nucleic acids and small ligands). We address these issues by performing first-of-its-kind comparative study of a representative collection of disorder- and structure-trained predictors using a comprehensive benchmark set with the structure- and disorder-derived annotations of PBRs (to analyze the cross-over) and the protein-, nucleic acid- and small ligand-binding proteins (to study the cross-predictions). RESULTS Three predictors provide accurate results: SCRIBER, ANCHOR and disoRDPbind. Some of the structure-trained methods make accurate predictions on the structure-annotated proteins. Similarly, the disorder-trained predictors predict well on the disorder-annotated proteins. However, the considered predictors generally fail to cross-over, with the exception of SCRIBER. Our study also reveals that virtually all methods substantially cross-predict PBRs, except for SCRIBER for the structure-annotated proteins and disoRDPbind for the disorder-annotated proteins. We formulate a novel hybrid predictor, hybridPBRpred, that combines results produced by disoRDPbind and SCRIBER to accurately predict disorder- and structure-annotated PBRs. HybridPBRpred generates accurate results that cross-over structure- and disorder-annotated proteins and produces relatively low amount of cross-predictions, offering an accurate alternative to predict PBRs. AVAILABILITY AND IMPLEMENTATION HybridPBRpred webserver, benchmark dataset and supplementary information are available at http://biomine.cs.vcu.edu/servers/hybridPBRpred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
48
|
Zhang J, Chen Q, Liu B. NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning. Brief Bioinform 2021; 22:6102667. [PMID: 33454744 DOI: 10.1093/bib/bbaa397] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/05/2020] [Accepted: 12/03/2020] [Indexed: 01/01/2023] Open
Abstract
The interactions between proteins and nucleic acid sequences play many important roles in gene expression and some cellular activities. Accurate prediction of the nucleic acid binding residues in proteins will facilitate the research of the protein functions, gene expression, drug design, etc. In this regard, several computational methods have been proposed to predict the nucleic acid binding residues in proteins. However, these methods cannot satisfactorily measure the global interactions among the residues along protein. Furthermore, these methods are suffering cross-prediction problem, new strategies should be explored to solve this problem. In this study, a new computational method called NCBRPred was proposed to predict the nucleic acid binding residues based on the multilabel sequence labeling model. NCBRPred used the bidirectional Gated Recurrent Units (BiGRUs) to capture the global interactions among the residues, and treats this task as a multilabel learning task. Experimental results on three widely used benchmark datasets and an independent dataset showed that NCBRPred achieved higher predictive results with lower cross-prediction, outperforming 10 existing state-of-the-art predictors. The web-server and a stand-alone package of NCBRPred are freely available at http://bliulab.net/NCBRPred. It is anticipated that NCBRPred will become a very useful tool for identifying nucleic acid binding residues.
Collapse
Affiliation(s)
- Jun Zhang
- Computer Science and Technology with Harbin Institute of Technology, Shenzhen, China
| | - Qingcai Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|
49
|
Bartas M, Červeň J, Guziurová S, Slychko K, Pečinka P. Amino Acid Composition in Various Types of Nucleic Acid-Binding Proteins. Int J Mol Sci 2021; 22:ijms22020922. [PMID: 33477647 PMCID: PMC7831508 DOI: 10.3390/ijms22020922] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 01/15/2021] [Accepted: 01/16/2021] [Indexed: 12/20/2022] Open
Abstract
Nucleic acid-binding proteins are traditionally divided into two categories: With the ability to bind DNA or RNA. In the light of new knowledge, such categorizing should be overcome because a large proportion of proteins can bind both DNA and RNA. Another even more important features of nucleic acid-binding proteins are so-called sequence or structure specificities. Proteins able to bind nucleic acids in a sequence-specific manner usually contain one or more of the well-defined structural motifs (zinc-fingers, leucine zipper, helix-turn-helix, or helix-loop-helix). In contrast, many proteins do not recognize nucleic acid sequence but rather local DNA or RNA structures (G-quadruplexes, i-motifs, triplexes, cruciforms, left-handed DNA/RNA form, and others). Finally, there are also proteins recognizing both sequence and local structural properties of nucleic acids (e.g., famous tumor suppressor p53). In this mini-review, we aim to summarize current knowledge about the amino acid composition of various types of nucleic acid-binding proteins with a special focus on significant enrichment and/or depletion in each category.
Collapse
|
50
|
Zhao B, Katuwawala A, Oldfield CJ, Dunker AK, Faraggi E, Gsponer J, Kloczkowski A, Malhis N, Mirdita M, Obradovic Z, Söding J, Steinegger M, Zhou Y, Kurgan L. DescribePROT: database of amino acid-level protein structure and function predictions. Nucleic Acids Res 2021; 49:D298-D308. [PMID: 33119734 PMCID: PMC7778963 DOI: 10.1093/nar/gkaa931] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/11/2020] [Accepted: 10/05/2020] [Indexed: 12/30/2022] Open
Abstract
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | | | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Eshel Faraggi
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Zoran Obradovic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|