1
|
Høie MH, Gade FS, Johansen J, Würtzen C, Winther O, Nielsen M, Marcatili P. DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Front Immunol 2024; 15:1322712. [PMID: 38390326 PMCID: PMC10882062 DOI: 10.3389/fimmu.2024.1322712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/08/2024] [Indexed: 02/24/2024] Open
Abstract
Accurate computational identification of B-cell epitopes is crucial for the development of vaccines, therapies, and diagnostic tools. However, current structure-based prediction methods face limitations due to the dependency on experimentally solved structures. Here, we introduce DiscoTope-3.0, a markedly improved B-cell epitope prediction tool that innovatively employs inverse folding structure representations and a positive-unlabelled learning strategy, and is adapted for both solved and predicted structures. Our tool demonstrates a considerable improvement in performance over existing methods, accurately predicting linear and conformational epitopes across multiple independent datasets. Most notably, DiscoTope-3.0 maintains high predictive performance across solved, relaxed and predicted structures, alleviating the need for experimental structures and extending the general applicability of accurate B-cell epitope prediction by 3 orders of magnitude. DiscoTope-3.0 is made widely accessible on two web servers, processing over 100 structures per submission, and as a downloadable package. In addition, the servers interface with RCSB and AlphaFoldDB, facilitating large-scale prediction across over 200 million cataloged proteins. DiscoTope-3.0 is available at: https://services.healthtech.dtu.dk/service.php?DiscoTope-3.0.
Collapse
Affiliation(s)
- Magnus Haraldson Høie
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Frederik Steensgaard Gade
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Julie Maria Johansen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Charlotte Würtzen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Ole Winther
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
- Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen, Denmark
- Department of Biology, Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark
| | - Morten Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Paolo Marcatili
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| |
Collapse
|
2
|
Bravi B. Development and use of machine learning algorithms in vaccine target selection. NPJ Vaccines 2024; 9:15. [PMID: 38242890 PMCID: PMC10798987 DOI: 10.1038/s41541-023-00795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/07/2023] [Indexed: 01/21/2024] Open
Abstract
Computer-aided discovery of vaccine targets has become a cornerstone of rational vaccine design. In this article, I discuss how Machine Learning (ML) can inform and guide key computational steps in rational vaccine design concerned with the identification of B and T cell epitopes and correlates of protection. I provide examples of ML models, as well as types of data and predictions for which they are built. I argue that interpretable ML has the potential to improve the identification of immunogens also as a tool for scientific discovery, by helping elucidate the molecular processes underlying vaccine-induced immune responses. I outline the limitations and challenges in terms of data availability and method development that need to be addressed to bridge the gap between advances in ML predictions and their translational application to vaccine design.
Collapse
Affiliation(s)
- Barbara Bravi
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
3
|
Wang Y, Tang H, Gao C, Ge M, Li Z, Dong Z, Zhao L. Flexibility-aware graph model for accurate epitope identification. Comput Biol Med 2022; 149:106064. [DOI: 10.1016/j.compbiomed.2022.106064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 08/05/2022] [Accepted: 08/27/2022] [Indexed: 11/25/2022]
|
4
|
Lu S, Li Y, Ma Q, Nan X, Zhang S. A Structure-Based B-cell Epitope Prediction Model Through Combing Local and Global Features. Front Immunol 2022; 13:890943. [PMID: 35844532 PMCID: PMC9283778 DOI: 10.3389/fimmu.2022.890943] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 05/23/2022] [Indexed: 11/24/2022] Open
Abstract
B-cell epitopes (BCEs) are a set of specific sites on the surface of an antigen that binds to an antibody produced by B-cell. The recognition of BCEs is a major challenge for drug design and vaccines development. Compared with experimental methods, computational approaches have strong potential for BCEs prediction at much lower cost. Moreover, most of the currently methods focus on using local information around target residue without taking the global information of the whole antigen sequence into consideration. We propose a novel deep leaning method through combing local features and global features for BCEs prediction. In our model, two parallel modules are built to extract local and global features from the antigen separately. For local features, we use Graph Convolutional Networks (GCNs) to capture information of spatial neighbors of a target residue. For global features, Attention-Based Bidirectional Long Short-Term Memory (Att-BLSTM) networks are applied to extract information from the whole antigen sequence. Then the local and global features are combined to predict BCEs. The experiments show that the proposed method achieves superior performance over the state-of-the-art BCEs prediction methods on benchmark datasets. Also, we compare the performance differences between data with or without global features. The experimental results show that global features play an important role in BCEs prediction. Our detailed case study on the BCEs prediction for SARS-Cov-2 receptor binding domain confirms that our method is effective for predicting and clustering true BCEs.
Collapse
Affiliation(s)
- Shuai Lu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
| | - Yuguang Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
| | - Qiang Ma
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Xiaofei Nan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- *Correspondence: Xiaofei Nan, ; Shoutao Zhang,
| | - Shoutao Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
- Longhu Laboratory of Advanced Immunology, Zhengzhou, China
- *Correspondence: Xiaofei Nan, ; Shoutao Zhang,
| |
Collapse
|
5
|
Brassea-Estardante HA, Martínez-Cruz O, Cárdenas-López JL, García-Orozco KD, Ochoa-Leyva A, López-Zavala AA. Identification of arginine kinase as an allergen of brown crab, Callinectes bellicosus, and in silico analysis of IgE-binding epitopes. Mol Immunol 2022; 143:147-156. [DOI: 10.1016/j.molimm.2022.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 01/25/2022] [Accepted: 01/27/2022] [Indexed: 10/19/2022]
|
6
|
Qiao X, Qu L, Guo Y, Hoshino T. Secondary Structure and Conformational Stability of the Antigen Residues Making Contact with Antibodies. J Phys Chem B 2021; 125:11374-11385. [PMID: 34615354 DOI: 10.1021/acs.jpcb.1c05997] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Antibodies are crucial biomolecules that bring high therapeutic efficacy in medicine and accurate molecular detection in diagnosis. Many studies have been devoted to analyzing the antigen-antibody interaction from the importance of understanding the antibody recognition mechanism. However, most of the previous studies examined the characteristic of the antibody for interaction. It is also informative to clarify the significant antigen residues contributing to the binding. To characterize the molecular interaction of antigens, we computationally analyzed 350 antigen-antibody complex structures by molecular mechanics (MM) calculations and molecular dynamics (MD) simulations. Based on the MM calculations, the antigen residues contributing to the binding were extracted from all the 350 complexes. The extracted residues are located at the antigen-antibody interface and are responsible for making contact with the antibody. The appearances of the charged polar residues, Asp, Glu, Arg, and Lys, were noticeably large. In contrast, the populations of the hydrophobic residues, Leu, Val, and Ala, were relatively low. The appearance frequencies of the other amino acid residues were almost close to the abundance of general proteins of eukaryotes. The binding score indicated that the hydrophilic interaction was dominant at the antigen-antibody contact instead of the hydrophobic one. The positively charged residues, Arg and Lys, remarkably contributed to the binding compared to the negatively charged ones, Asp and Glu. Considerable contributions were also observed for the noncharged polar residues, Asn and Gln. The analysis of the secondary structures of the extracted antigen residues suggested that there was no marked difference in recognition by antibodies among helix, sheet, turn, and coil. A long helix of the antigen sometimes made contact with antibody complementarity-determining regions, and a large sheet also frequently covered the antibody heavy and light chains. The turn structure was the most popularly observed at the contact with antibody among 350 complexes. Three typical complexes were picked up for each of the four secondary structures. MD simulations were performed to examine the stability of the interfacial structures of the antigens for these 12 complex models. The alterations of secondary structures were monitored through the simulations. The structural fluctuations of the contact residues were low compared with the other domains of antigen molecules. No drastic conversion was observed for every model during the 100 ns simulation. The motions of the interfacial antigen residues were small compared to the other residues on the protein surface. Therefore, diverse molecular conformations are possible for antibody recognition as long as the target areas are polar, nonflexible, and protruding on the protein surface.
Collapse
Affiliation(s)
- Xinyue Qiao
- Graduate School of Pharmaceutical Sciences, Chiba University, Inohana 1-8-1, Chuo-ku, Chiba 260-8675, Japan
| | - Liang Qu
- Graduate School of Pharmaceutical Sciences, Chiba University, Inohana 1-8-1, Chuo-ku, Chiba 260-8675, Japan
| | - Yan Guo
- Graduate School of Pharmaceutical Sciences, Chiba University, Inohana 1-8-1, Chuo-ku, Chiba 260-8675, Japan
| | - Tyuji Hoshino
- Graduate School of Pharmaceutical Sciences, Chiba University, Inohana 1-8-1, Chuo-ku, Chiba 260-8675, Japan
| |
Collapse
|
7
|
Computational-Driven Epitope Verification and Affinity Maturation of TLR4-Targeting Antibodies. Int J Mol Sci 2021; 22:ijms22115989. [PMID: 34206009 PMCID: PMC8198660 DOI: 10.3390/ijms22115989] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 05/29/2021] [Indexed: 01/16/2023] Open
Abstract
Toll-like receptor (TLR) signaling plays a critical role in the induction and progression of autoimmune diseases such as rheumatoid arthritis, systemic lupus erythematous, experimental autoimmune encephalitis, type 1 diabetes mellitus and neurodegenerative diseases. Deciphering antigen recognition by antibodies provides insights and defines the mechanism of action into the progression of immune responses. Multiple strategies, including phage display and hybridoma technologies, have been used to enhance the affinity of antibodies for their respective epitopes. Here, we investigate the TLR4 antibody-binding epitope by computational-driven approach. We demonstrate that three important residues, i.e., Y328, N329, and K349 of TLR4 antibody binding epitope identified upon in silico mutagenesis, affect not only the interaction and binding affinity of antibody but also influence the structural integrity of TLR4. Furthermore, we predict a novel epitope at the TLR4-MD2 interface which can be targeted and explored for therapeutic antibodies and small molecules. This technique provides an in-depth insight into antibody-antigen interactions at the resolution and will be beneficial for the development of new monoclonal antibodies. Computational techniques, if coupled with experimental methods, will shorten the duration of rational design and development of antibody therapeutics.
Collapse
|
8
|
Solihah B, Azhari A, Musdholifah A. Enhancement of conformational B-cell epitope prediction using CluSMOTE. PeerJ Comput Sci 2020; 6:e275. [PMID: 33816926 PMCID: PMC7924438 DOI: 10.7717/peerj-cs.275] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Accepted: 04/15/2020] [Indexed: 06/12/2023]
Abstract
BACKGROUND A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. Several epitope prediction models also have been developed, including learning-based methods. However, the performance of the model is still not optimum. The main problem in learning-based prediction models is class imbalance. METHODS This study proposes CluSMOTE, which is a combination of a cluster-based undersampling method and Synthetic Minority Oversampling Technique. The approach is used to generate other sample data to ensure that the dataset of the conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed to identify the cluster in the majority class. Some of the randomly selected data is taken from each cluster, considering the oversampling degree, and combined with the minority class data. The balance data is utilized as the training dataset to develop a conformational epitope prediction. Furthermore, two binary classification methods, Support Vector Machine and Decision Tree, are separately used to develop model prediction and to evaluate the performance of CluSMOTE in predicting conformational B-cell epitope. The experiment is focused on determining the best parameter for optimal CluSMOTE. Two independent datasets are used to compare the proposed prediction model with state of the art methods. The first and the second datasets represent the general protein and the glycoprotein antigens respectively. RESULT The experimental result shows that CluSMOTE Decision Tree outperformed the Support Vector Machine in terms of AUC and Gmean as performance measurements. The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better than other methods in the general protein antigen, though comparable with SEPPA 3 in the glycoprotein antigen.
Collapse
Affiliation(s)
- Binti Solihah
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
- Department of Informatics Engineering, Universitas Trisakti, Grogol, Jakarta Barat, Indonesia
| | - Azhari Azhari
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Aina Musdholifah
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| |
Collapse
|
9
|
Ferdous S, Kelm S, Baker TS, Shi J, Martin AC. B-cell epitopes: Discontinuity and conformational analysis. Mol Immunol 2019; 114:643-650. [DOI: 10.1016/j.molimm.2019.09.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 02/07/2019] [Accepted: 09/13/2019] [Indexed: 11/26/2022]
|
10
|
Zhao L, Wu S, Jiang J, Li W, Luo J, Li J. Novel overlapping subgraph clustering for the detection of antigen epitopes. Bioinformatics 2019; 34:2061-2068. [PMID: 29409062 DOI: 10.1093/bioinformatics/bty051] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 02/01/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Antigens that contain overlapping epitopes have been occasionally reported. As current algorithms mainly take a one-antigen-one-epitope approach to the prediction of epitopes, they are not capable of detecting these multiple and overlapping epitopes accurately, or even those multiple and separated epitopes existing in some other antigens. Results We introduce a novel subgraph clustering algorithm for more accurate detection of epitopes. This algorithm takes graph partitions as seeds, and expands the seeds to merge overlapping subgraphs based on the term frequency-inverse document frequency (TF-IDF) featured similarity. Then, the merged subgraphs are each classified as an epitope or non-epitope. Tests of our algorithm were conducted on three newly collected datasets of antigens. In the first dataset, each antigen contains only a single epitope; in the second, each antigen contains only multiple and separated epitopes; and in the third, each antigen contains overlapping epitopes. The prediction performance of our algorithm is significantly better than the state-of-art methods. The lifts of the averaged f-scores on top of the best existing methods are 60, 75 and 22% for the single epitope detection, the multiple and separated epitopes detection, and the overlapping epitopes detection, respectively. Availability and implementation The source code is available at github.com/lzhlab/glep/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liang Zhao
- Department of Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Hubei, China.,Department of Computer Science, School of Computing and Electronic Information, Guangxi University, Nanning, China
| | - Shaogui Wu
- Department of Computer Science, School of Computing and Electronic Information, Guangxi University, Nanning, China
| | - Jiawen Jiang
- Department of Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Hubei, China
| | - Wencui Li
- Department of Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Hubei, China
| | - Jie Luo
- Department of Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Hubei, China
| | - Jinyan Li
- Department of Data Science, Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Broadway, NSW 2007, Australia
| |
Collapse
|
11
|
Kozlova E, Viart B, de Avila R, Felicori L, Chavez-Olortegui C. Classification epitopes in groups based on their protein family. BMC Bioinformatics 2015; 16 Suppl 19:S7. [PMID: 26696329 PMCID: PMC4686779 DOI: 10.1186/1471-2105-16-s19-s7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background The humoral immune system response is based on the interaction between antibodies and antigens for the clearance of pathogens and foreign molecules. The interaction between these proteins occurs at specific positions known as antigenic determinants or B-cell epitopes. The experimental identification of epitopes is costly and time consuming. Therefore the use of in silico methods, to help discover new epitopes, is an appealing alternative due the importance of biomedical applications such as vaccine design, disease diagnostic, anti-venoms and immune-therapeutics. However, the performance of predictions is not optimal been around 70% of accuracy. Further research could increase our understanding of the biochemical and structural properties that characterize a B-cell epitope. Results We investigated the possibility of linear epitopes from the same protein family to share common properties. This hypothesis led us to analyze physico-chemical (PCP) and predicted secondary structure (PSS) features of a curated dataset of epitope sequences available in the literature belonging to two different groups of antigens (metalloproteinases and neurotoxins). We discovered statistically significant parameters with data mining techniques which allow us to distinguish neurotoxin from metalloproteinase and these two from random sequences. After a five cross fold validation we found that PCP based models obtained area under the curve values (AUC) and accuracy above 0.9 for regression, decision tree and support vector machine. Conclusions We demonstrated that antigen's family can be inferred from properties within a single group of linear epitopes (metalloproteinases or neurotoxins). Also we discovered the characteristics that represent these two epitope groups including their similarities and differences with random peptides and their respective amino acid sequence. These findings open new perspectives to improve epitope prediction by considering the specific antigen's protein family. We expect that these findings will help to improve current computational mapping methods based on physico-chemical due it's potential application during epitope discovery.
Collapse
|
12
|
LRC: A new algorithm for prediction of conformational B-cell epitopes using statistical approach and clustering method. J Immunol Methods 2015; 427:51-7. [PMID: 26455801 DOI: 10.1016/j.jim.2015.09.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2015] [Revised: 09/03/2015] [Accepted: 09/25/2015] [Indexed: 11/22/2022]
Abstract
Identifying of B-cell epitopes from antigen is a challenging task in bioinformatics and applied in vaccine design and drug development. Recently, several methods have been presented to predict epitopes. The physicochemical or structural properties are used by these methods. In this paper, we propose a more appropriate epitope prediction method, LRC, that is based on a combination of physicochemical and structural properties. First, we construct a graph from the surface of antigen, then by using the logistic regression, we model the physicochemical and structural properties and weight each vertex of the graph. Finally, we utilize a clustering method, MCL, to cluster the graph. The effectiveness of the proposed method is benchmarked using several antibody-antigen PDB complexes. The results of LRC algorithm are compared with other methods (DiscoTope, SEPPA and Ellipro) in terms of sensitivity, specificity and other well-known measures. Results indicate that applying the LRC algorithm improves the precision of prediction epitopes in comparison with the mentioned methods. Our LRC program and supplementary material are freely available from http://bs.ipm.ir/softwares/LRC/.
Collapse
|
13
|
Toward a Literature-Driven Definition of Big Data in Healthcare. BIOMED RESEARCH INTERNATIONAL 2015; 2015:639021. [PMID: 26137488 PMCID: PMC4468280 DOI: 10.1155/2015/639021] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 02/04/2015] [Indexed: 11/17/2022]
Abstract
Objective. The aim of this study was to provide a definition of big data in healthcare. Methods. A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals (n) and the number of variables (p) for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed. Results. A total of 196 papers were included. Big data can be defined as datasets with Log(n∗p) ≥ 7. Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues. Conclusion. Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data.
Collapse
|
14
|
Shen K, Shen L, Wang J, Jiang Z, Shen B. Understanding Amino Acid Mutations in Hepatitis B Virus Proteins for Rational Design of Vaccines and Drugs. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2015; 99:131-53. [DOI: 10.1016/bs.apcsb.2015.03.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
|
15
|
Ren J, Liu Q, Ellis J, Li J. Tertiary structure-based prediction of conformational B-cell epitopes through B factors. ACTA ACUST UNITED AC 2014; 30:i264-73. [PMID: 24931993 PMCID: PMC4058920 DOI: 10.1093/bioinformatics/btu281] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Motivation: B-cell epitope is a small area on the surface of an antigen that binds to an antibody. Accurately locating epitopes is of critical importance for vaccine development. Compared with wet-lab methods, computational methods have strong potential for efficient and large-scale epitope prediction for antigen candidates at much lower cost. However, it is still not clear which features are good determinants for accurate epitope prediction, leading to the unsatisfactory performance of existing prediction methods. Method and results: We propose a much more accurate B-cell epitope prediction method. Our method uses a new feature B factor (obtained from X-ray crystallography), combined with other basic physicochemical, statistical, evolutionary and structural features of each residue. These basic features are extended by a sequence window and a structure window. All these features are then learned by a two-stage random forest model to identify clusters of antigenic residues and to remove isolated outliers. Tested on a dataset of 55 epitopes from 45 tertiary structures, we prove that our method significantly outperforms all three existing structure-based epitope predictors. Following comprehensive analysis, it is found that features such as B factor, relative accessible surface area and protrusion index play an important role in characterizing B-cell epitopes. Our detailed case studies on an HIV antigen and an influenza antigen confirm that our second stage learning is effective for clustering true antigenic residues and for eliminating self-made prediction errors introduced by the first-stage learning. Availability and implementation: Source codes are available on request. Contact:jinyan.li@uts.edu.au Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Ren
- Advanced Analytics Institute and Centre for Health Technologies and Department of Molecular Science, University of Technology Sydney, Broadway, NSW 2007, Australia
| | - Qian Liu
- Advanced Analytics Institute and Centre for Health Technologies and Department of Molecular Science, University of Technology Sydney, Broadway, NSW 2007, Australia
| | - John Ellis
- Advanced Analytics Institute and Centre for Health Technologies and Department of Molecular Science, University of Technology Sydney, Broadway, NSW 2007, Australia
| | - Jinyan Li
- Advanced Analytics Institute and Centre for Health Technologies and Department of Molecular Science, University of Technology Sydney, Broadway, NSW 2007, Australia
| |
Collapse
|
16
|
Zhang J, Zhao X, Sun P, Gao B, Ma Z. Conformational B-cell epitopes prediction from sequences using cost-sensitive ensemble classifiers and spatial clustering. BIOMED RESEARCH INTERNATIONAL 2014; 2014:689219. [PMID: 25045691 PMCID: PMC4083607 DOI: 10.1155/2014/689219] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Revised: 05/02/2014] [Accepted: 05/10/2014] [Indexed: 12/20/2022]
Abstract
B-cell epitopes are regions of the antigen surface which can be recognized by certain antibodies and elicit the immune response. Identification of epitopes for a given antigen chain finds vital applications in vaccine and drug research. Experimental prediction of B-cell epitopes is time-consuming and resource intensive, which may benefit from the computational approaches to identify B-cell epitopes. In this paper, a novel cost-sensitive ensemble algorithm is proposed for predicting the antigenic determinant residues and then a spatial clustering algorithm is adopted to identify the potential epitopes. Firstly, we explore various discriminative features from primary sequences. Secondly, cost-sensitive ensemble scheme is introduced to deal with imbalanced learning problem. Thirdly, we adopt spatial algorithm to tell which residues may potentially form the epitopes. Based on the strategies mentioned above, a new predictor, called CBEP (conformational B-cell epitopes prediction), is proposed in this study. CBEP achieves good prediction performance with the mean AUC scores (AUCs) of 0.721 and 0.703 on two benchmark datasets (bound and unbound) using the leave-one-out cross-validation (LOOCV). When compared with previous prediction tools, CBEP produces higher sensitivity and comparable specificity values. A web server named CBEP which implements the proposed method is available for academic use.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 1300117, China
| | - Xiaowei Zhao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 1300117, China
| | - Pingping Sun
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 1300117, China
- The Engineering Laboratory for Drug-Gene and Protein Screening, Northeast Normal University, Changchun 1300117, China
| | - Bo Gao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 1300117, China
| | - Zhiqiang Ma
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 1300117, China
| |
Collapse
|
17
|
Hoi SCH, Li Z, Wong L, Nguyen H, Li J. Coupling Graphs, Efficient Algorithms and B-Cell Epitope Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:7-16. [PMID: 26355502 DOI: 10.1109/tcbb.2013.136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Coupling graphs are newly introduced in this paper to meet many application needs particularly in the field of bioinformatics. A coupling graph is a two-layer graph complex, in which each node from one layer of the graph complex has at least one connection with the nodes in the other layer, and vice versa. The coupling graph model is sufficiently powerful to capture strong and inherent associations between subgraph pairs in complicated applications. The focus of this paper is on mining algorithms of frequent coupling subgraphs and bioinformatics application. Although existing frequent subgraph mining algorithms are competent to identify frequent subgraphs from a graph database, they perform poorly on frequent coupling subgraph mining because they generate many irrelevant subgraphs. We propose a novel graph transformation technique to transform a coupling graph into a generic graph. Based on the transformed coupling graphs, existing graph mining methods are then utilized to discover frequent coupling subgraphs. We prove that the transformation is precise and complete and that the restoration is reversible. Experiments carried out on a database containing 10,511 coupling graphs show that our proposed algorithm reduces the mining time very much in comparison with the existing subgraph mining algorithms. Moreover, we demonstrate the usefulness of frequent coupling subgraphs by applying our algorithm to make accurate predictions of epitopes in antibody-antigen binding.
Collapse
|
18
|
Schönbach C, Tongsima S, Chan J, Brusic V, Tan TW, Ranagathan S. InCoB2012 Conference: from biological data to knowledge to technological breakthroughs. BMC Bioinformatics 2012; 13 Suppl 17:S1. [PMID: 23281929 PMCID: PMC3521245 DOI: 10.1186/1471-2105-13-s17-s1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China.
Collapse
Affiliation(s)
- Christian Schönbach
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka 820-8502, Japan
- Biomedical Informatics Research and Development Center, Kyushu Institute of Technology, Fukuoka 820-8502, Japan
| | - Sissades Tongsima
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Thailand Science Park, Pathumthani 12120, Thailand
| | - Jonathan Chan
- School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok 10140, Thailand
| | - Vladimir Brusic
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Republic of Singapore
- Computational Resource Centre (A*CRC), A*STAR, Singapore 138632, Republic of Singapore
| | - Shoba Ranagathan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Republic of Singapore
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|