1
|
Davila A, Xu Z, Li S, Rozewicki J, Wilamowski J, Kotelnikov S, Kozakov D, Teraguchi S, Standley DM. AbAdapt: an adaptive approach to predicting antibody-antigen complex structures from sequence. BIOINFORMATICS ADVANCES 2022; 2:vbac015. [PMID: 36699363 PMCID: PMC9710585 DOI: 10.1093/bioadv/vbac015] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 02/15/2022] [Accepted: 03/03/2022] [Indexed: 01/28/2023]
Abstract
Motivation The scoring of antibody-antigen docked poses starting from unbound homology models has not been systematically optimized for a large and diverse set of input sequences. Results To address this need, we have developed AbAdapt, a webserver that accepts antibody and antigen sequences, models their 3D structures, predicts epitope and paratope, and then docks the modeled structures using two established docking engines (Piper and Hex). Each of the key steps has been optimized by developing and training new machine-learning models. The sequences from a diverse set of 622 antibody-antigen pairs with known structure were used as inputs for leave-one-out cross-validation. The final set of cluster representatives included at least one 'Adequate' pose for 550/622 (88.4%) of the queries. The median (interquartile range) ranks of these 'Adequate' poses were 22 (5-77). Similar results were obtained on a holdout set of 100 unrelated antibody-antigen pairs. When epitopes were repredicted using docking-derived features for specific antibodies, the median ROC AUC increased from 0.679 to 0.720 in cross-validation and from 0.694 to 0.730 in the holdout set. Availability and implementation AbAdapt and related data are available at https://sysimm.org/abadapt/. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Ana Davila
- Research Institute for Microbial Diseases, Department of Genome Informatics, Osaka University, Suita 565-0871, Japan
| | - Zichang Xu
- Research Institute for Microbial Diseases, Department of Genome Informatics, Osaka University, Suita 565-0871, Japan
| | - Songling Li
- Research Institute for Microbial Diseases, Department of Genome Informatics, Osaka University, Suita 565-0871, Japan
| | - John Rozewicki
- Research Institute for Microbial Diseases, Department of Genome Informatics, Osaka University, Suita 565-0871, Japan
| | - Jan Wilamowski
- Research Institute for Microbial Diseases, Department of Genome Informatics, Osaka University, Suita 565-0871, Japan
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-5252, USA,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794-5252, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-5252, USA,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794-5252, USA
| | - Shunsuke Teraguchi
- Research Institute for Microbial Diseases, Department of Genome Informatics, Osaka University, Suita 565-0871, Japan,Faculty of Data Science, Shiga University, Hikone 522-8522, Japan
| | - Daron M Standley
- Research Institute for Microbial Diseases, Department of Genome Informatics, Osaka University, Suita 565-0871, Japan,Immunology Frontier Research Center, Department of Systems Immunology, Osaka University, Suita 565-0871, Japan,To whom correspondence should be addressed.
| |
Collapse
|
2
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
3
|
Liu HF, Liu R. Structure-based prediction of post-translational modification cross-talk within proteins using complementary residue- and residue pair-based features. Brief Bioinform 2021; 21:609-620. [PMID: 30649184 DOI: 10.1093/bib/bby123] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 11/26/2018] [Accepted: 11/30/2018] [Indexed: 02/07/2023] Open
Abstract
Post-translational modification (PTM)-based regulation can be mediated not only by the modification of a single residue but also by the interplay of different modifications. Accurate prediction of PTM cross-talk is a highly challenging issue and is in its infant stage. Especially, less attention has been paid to the structural preferences (except intrinsic disorder and spatial proximity) of cross-talk pairs and the characteristics of individual residues involved in cross-talk, which may restrict the improvement of the prediction accuracy. Here we report a structure-based algorithm called PCTpred to improve the PTM cross-talk prediction. The comprehensive residue- and residue pair-based features were designed for paired PTM sites at the sequence and structural levels. Through feature selection, we reserved 23 newly introduced descriptors and 3 traditional descriptors to develop a sequence-based predictor PCTseq and a structure-based predictor PCTstr, both of which were integrated to construct our final prediction model. According to pair- and protein-based evaluations, PCTpred yielded area under the curve values of approximately 0.9 and 0.8, respectively. Even when removing the distance preference of samples or using the input of modeled structures, our prediction performance was maintained or moderately reduced. PCTpred displayed stable and reliable improvements over the state-of-the-art methods based on various evaluations. The source code and data set are freely available at https://github.com/Liulab-HZAU/PCTpred or http://liulab.hzau.edu.cn/PCTpred/.
Collapse
Affiliation(s)
- Hui-Fang Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
4
|
Carrillo-Cabada H, Benson J, Razavi AM, Mulligan B, Cuendet MA, Weinstein H, Taufer M, Estrada T. A Graphic Encoding Method for Quantitative Classification of Protein Structure and Representation of Conformational Changes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1336-1349. [PMID: 31603792 PMCID: PMC9119144 DOI: 10.1109/tcbb.2019.2945291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In order to successfully predict a proteins function throughout its trajectory, in addition to uncovering changes in its conformational state, it is necessary to employ techniques that maintain its 3D information while performing at scale. We extend a protein representation that encodes secondary and tertiary structure into fix-sized, color images, and a neural network architecture (called GEM-net) that leverages our encoded representation. We show the applicability of our method in two ways: (1) performing protein function prediction, hitting accuracy between 78 and 83 percent, and (2) visualizing and detecting conformational changes in protein trajectories during molecular dynamics simulations.
Collapse
|
5
|
Su H, Peng Z, Yang J. Recognition of small molecule-RNA binding sites using RNA sequence and structure. Bioinformatics 2021; 37:36-42. [PMID: 33416863 PMCID: PMC8034527 DOI: 10.1093/bioinformatics/btaa1092] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/12/2020] [Accepted: 12/23/2020] [Indexed: 11/22/2022] Open
Abstract
Motivation RNA molecules become attractive small molecule drug targets to treat disease in recent years. Computer-aided drug design can be facilitated by detecting the RNA sites that bind small molecules. However, very limited progress has been reported for the prediction of small molecule–RNA binding sites. Results We developed a novel method RNAsite to predict small molecule–RNA binding sites using sequence profile- and structure-based descriptors. RNAsite was shown to be competitive with the state-of-the-art methods on the experimental structures of two independent test sets. When predicted structure models were used, RNAsite outperforms other methods by a large margin. The possibility of improving RNAsite by geometry-based binding pocket detection was investigated. The influence of RNA structure’s flexibility and the conformational changes caused by ligand binding on RNAsite were also discussed. RNAsite is anticipated to be a useful tool for the design of RNA-targeting small molecule drugs. Availability and implementation http://yanglab.nankai.edu.cn/RNAsite. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hong Su
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| |
Collapse
|
6
|
Investigation of machine learning techniques on proteomics: A comprehensive survey. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 149:54-69. [PMID: 31568792 DOI: 10.1016/j.pbiomolbio.2019.09.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/16/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022]
Abstract
Proteomics is the extensive investigation of proteins which has empowered the recognizable proof of consistently expanding quantities of protein. Proteins are necessary part of living life form, with numerous capacities. The proteome is the complete arrangement of proteins that are created or altered by a life form or framework of the organism. Proteome fluctuates with time and unambiguous prerequisites, or stresses, that a cell or organism experiences. Proteomics is an interdisciplinary area that has derived from the hereditary data of different genome ventures. Much proteomics information is gathered with the assistance of high throughput techniques, for example, mass spectrometry and microarray. It would regularly take weeks or months to analyze the information and perform examinations by hand. Therefore, scholars and scientific experts are teaming up with computer science researchers and mathematicians to make projects and pipeline to computationally examine the protein information. Utilizing bioinformatics procedures, scientists are prepared to do quicker investigation and protein information storing. The goal of this paper is to brief about the review of machine learning procedures and its application in the field of proteomics.
Collapse
|
7
|
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes. Sci Rep 2016; 6:34044. [PMID: 27665935 PMCID: PMC5036049 DOI: 10.1038/srep34044] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/07/2016] [Indexed: 11/08/2022] Open
Abstract
A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.
Collapse
|
8
|
Korb O, Finn PW, Jones G. The cloud and other new computational methods to improve molecular modelling. Expert Opin Drug Discov 2014; 9:1121-31. [DOI: 10.1517/17460441.2014.941800] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
9
|
Li S, Yamashita K, Amada KM, Standley DM. Quantifying sequence and structural features of protein-RNA interactions. Nucleic Acids Res 2014; 42:10086-98. [PMID: 25063293 PMCID: PMC4150784 DOI: 10.1093/nar/gku681] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Increasing awareness of the importance of protein–RNA interactions has motivated many approaches to predict residue-level RNA binding sites in proteins based on sequence or structural characteristics. Sequence-based predictors are usually high in sensitivity but low in specificity; conversely structure-based predictors tend to have high specificity, but lower sensitivity. Here we quantified the contribution of both sequence- and structure-based features as indicators of RNA-binding propensity using a machine-learning approach. In order to capture structural information for proteins without a known structure, we used homology modeling to extract the relevant structural features. Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers. These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions. We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.
Collapse
Affiliation(s)
- Songling Li
- Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| | - Kazuo Yamashita
- Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| | - Karlou Mar Amada
- Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| | - Daron M Standley
- Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| |
Collapse
|