1
|
Oladipo EK, Ojo TO, Olufemi SE, Irewolede BA, Adediran DA, Abiala AG, Hezekiah OS, Idowu AF, Oladeji YG, Ikuomola MO, Olayinka AT, Akanbi GO, Idowu UA, Olubodun OA, Odunlami FD, Ogunniran JA, Akinro OP, Adegoke HM, Folakanmi EO, Usman TA, Oladokun EF, Oluwasanya GJ, Awobiyi HO, Oluwasegun JA, Akintibubo SA, Jimah EM. Proteome based analysis of circulating SARS-CoV-2 variants: approach to a universal vaccine candidate. Genes Genomics 2023; 45:1489-1508. [PMID: 37548884 DOI: 10.1007/s13258-023-01426-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 07/09/2023] [Indexed: 08/08/2023]
Abstract
The discovery of the first infectious variant in Wuhan, China, in December 2019, has posed concerns over global health due to the spread of COVID-19 and subsequent variants. While the majority of patients experience flu-like symptoms such as cold and fever, a small percentage, particularly those with compromised immune systems, progress from mild illness to fatality. COVID-19 is caused by a RNA virus known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Our approach involved utilizing immunoinformatic to identify vaccine candidates with multiple epitopes and ligand-binding regions in reported SARS-CoV-2 variants. Through analysis of the spike glycoprotein, we identified dominant epitopes for T-cells and B-cells, resulting in a vaccine construct containing two helper T-cell epitopes, six cytotoxic T-cell epitopes, and four linear B-cell epitopes. Prior to conjugation with adjuvants and linkers, all epitopes were evaluated for antigenicity, toxicity, and allergenicity. Additionally, we assessed the vaccine Toll-Like Receptors complex (2, 3, and 4). The vaccine construct demonstrated antigenicity, non-toxicity, and non-allergenicity, thereby enabling the host to generate antibodies with favorable physicochemical characteristics. Furthermore, the 3D structure of the B-cell construct exhibited a ProSA-web z-score plot with a value of -1.71, indicating the reliability of the designed structure. The Ramachandran plot analysis revealed that 99.6% of the amino acid residues in the vaccine subunit were located in the high favored observation region, further establishing its strong candidacy as a vaccination option.
Collapse
Affiliation(s)
- Elijah Kolawole Oladipo
- Department of Microbiology, Laboratory of Molecular Biology, Immunology and Informatics, Adeleke University, Ede, Osun State, Nigeria.
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria.
| | - Taiwo Ooreoluwa Ojo
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Biochemistry, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Seun Elijah Olufemi
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Biochemistry, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | | | - Daniel Adewole Adediran
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Biochemistry, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Asegunloluwa Grace Abiala
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Oluwaseun Samuel Hezekiah
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Akindele Felix Idowu
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Biochemistry, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Yinmi Gabriel Oladeji
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Microbiology, Obafemi Awolowo University, Ile Ife, Osun State, Nigeria
| | - Mary Omotoyinbo Ikuomola
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Adenike Titilayo Olayinka
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Medical Microbiology and Parasitology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Gideon Oluwamayowa Akanbi
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Pure and Applied Biology, Microbiology Unit, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Usman Abiodun Idowu
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Pure and Applied Biology, Microbiology Unit, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Odunola Abimbola Olubodun
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Folusho Daniel Odunlami
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - James Akinwumi Ogunniran
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Medical Microbiology and Parasitology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Omodamola Paulina Akinro
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Pure and Applied Biology, Microbiology Unit, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Hadijat Motunrayo Adegoke
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Computational Biophysical Chemistry Laboratory, Department of Pure and Applied Chemistry, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Elizabeth Oluwatoyin Folakanmi
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | | | - Elizabeth Folakemi Oladokun
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Pure and Applied Biology, Microbiology Unit, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | | | | | - Jerry Ayobami Oluwasegun
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Samuel Adebowale Akintibubo
- Genomics Unit, Helix Biogen Institute, Ogbomoso, Oyo State, Nigeria
- Department of Pure and Applied Biology, Microbiology Unit, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | | |
Collapse
|
2
|
Zhao X, Jin J, Xu R, Li S, Sun H, Wang X, Cichocki A. A Regional Smoothing Block Sparse Bayesian Learning Method With Temporal Correlation for Channel Selection in P300 Speller. Front Hum Neurosci 2022; 16:875851. [PMID: 35754766 PMCID: PMC9231363 DOI: 10.3389/fnhum.2022.875851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 05/18/2022] [Indexed: 11/13/2022] Open
Abstract
The P300-based brain-computer interfaces (BCIs) enable participants to communicate by decoding the electroencephalography (EEG) signal. Different regions of the brain correspond to various mental activities. Therefore, removing weak task-relevant and noisy channels through channel selection is necessary when decoding a specific type of activity from EEG. It can improve the recognition accuracy and reduce the training time of the subsequent models. This study proposes a novel block sparse Bayesian-based channel selection method for the P300 speller. In this method, we introduce block sparse Bayesian learning (BSBL) into the channel selection of P300 BCI for the first time and propose a regional smoothing BSBL (RSBSBL) by combining the spatial distribution properties of EEG. The RSBSBL can determine the number of channels adaptively. To ensure practicality, we design an automatic selection iteration strategy model to reduce the time cost caused by the inverse operation of the large-size matrix. We verified the proposed method on two public P300 datasets and on our collected datasets. The experimental results show that the proposed method can remove the inferior channels and work with the classifier to obtain high-classification accuracy. Hence, RSBSBL has tremendous potential for channel selection in P300 tasks.
Collapse
Affiliation(s)
- Xueqing Zhao
- The Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Jing Jin
- The Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
- Shenzhen Research Institute of East China University of Technology, Shenzhen, China
| | - Ren Xu
- g.tec medical engineering GmbH, Graz, Austria
| | - Shurui Li
- The Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Hao Sun
- The Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Xingyu Wang
- The Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Andrzej Cichocki
- Skolkovo Institute of Science and Technology, Moscow, Russia
- Systems Research Institute of Polish Academy of Science, Warsaw, Poland
- Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
| |
Collapse
|
3
|
Lv Z, Wang P, Zou Q, Jiang Q. Identification of Sub-Golgi protein localization by use of deep representation learning features. Bioinformatics 2020; 36:5600-5609. [PMID: 33367627 PMCID: PMC8023683 DOI: 10.1093/bioinformatics/btaa1074] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 12/10/2020] [Accepted: 12/14/2020] [Indexed: 12/11/2022] Open
Abstract
Motivation The Golgi apparatus has a key functional role in protein biosynthesis within the eukaryotic cell with malfunction resulting in various neurodegenerative diseases. For a better understanding of the Golgi apparatus, it is essential to identification of sub-Golgi protein localization. Although some machine learning methods have been used to identify sub-Golgi localization proteins by sequence representation fusion, more accurate sub-Golgi protein identification is still challenging by existing methodology. Results we developed a protein sub-Golgi localization identification protocol using deep representation learning features with 107 dimensions. By this protocol, we demonstrated that instead of multi-type protein sequence feature representation fusion as in previous state-of-the-art sub-Golgi-protein localization classifiers, it is sufficient to exploit only one type of feature representation for more accurately identification of sub-Golgi proteins. Compared with independent testing results for benchmark datasets, our protocol is able to perform generally, reliably and robustly for sub-Golgi protein localization prediction. Availabilityand implementation A use-friendly webserver is freely accessible at http://isGP-DRLF.aibiochem.net and the prediction code is accessible at https://github.com/zhibinlv/isGP-DRLF. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pingping Wang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Qinghua Jiang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| |
Collapse
|
4
|
Oladipo EK, Ajayi AF, Ariyo OE, Onile SO, Jimah EM, Ezediuno LO, Adebayo OI, Adebayo ET, Odeyemi AN, Oyeleke MO, Oyewole MP, Oguntomi AS, Akindiya OE, Olamoyegun BO, Aremu VO, Arowosaye AO, Aboderin DO, Bello HB, Senbadejo TY, Awoyelu EH, Oladipo AA, Oladipo BB, Ajayi LO, Majolagbe ON, Oyawoye OM, Oloke JK. Exploration of surface glycoprotein to design multi-epitope vaccine for the prevention of Covid-19. INFORMATICS IN MEDICINE UNLOCKED 2020; 21:100438. [PMID: 33043110 PMCID: PMC7533051 DOI: 10.1016/j.imu.2020.100438] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 09/11/2020] [Accepted: 09/27/2020] [Indexed: 01/07/2023] Open
Abstract
Stimulation and generation of T and B cell-mediated long-term immune response are essential for the curbing of a deadly virus such as SAR-CoV-2 (Severe Acute Respiratory Corona Virus 2). Immunoinformatics approach in vaccine design takes advantage of antigenic and non-allergenic epitopes present on the spike glycoprotein of SARS-CoV-2 to elicit immune responses. T cells and B cells epitopes were predicted, and the selected residues were subjected to allergenicity, antigenicity and toxicity screening which were linked by appropriate linkers to form a multi-epitope subunit vaccine. The physiochemical properties of the vaccine construct were analyzed, and the molecular weight, molecular formula, theoretical isoelectric point value, half-life, solubility score, instability index, aliphatic index and GRAVY were predicted. The vaccine structure was constructed, refined, validated, and disulfide engineered to get the best model. Molecular binding simulation and molecular dynamics simulation were carried out to predict the stability and binding affinity of the vaccine construct with TLRs. Codon acclimatization and in silico cloning were performed to confirm the vaccine expression and potency. Results obtained indicated that this novel vaccine candidate is non-toxic, capable of initiating the immunogenic response and will not induce an allergic reaction. The highest binding energy was observed in TLR4 (Toll-like Receptor 4) (-1398.1), and the least is TLR 2 (-1479.6). The steady rise in Th (T-helper) cell population with memory development was noticed, and IFN-g (Interferon gamma) was provoked after simulation. At this point, the vaccine candidate awaits animal trial to validate its efficacy and safety for use in the prevention of the novel COVID-19 (Coronavirus Disease 2019) infections.
Collapse
Affiliation(s)
- Elijah Kolawole Oladipo
- Department of Microbiology, Laboratory of Molecular Biology, Immunology and Bioinformatics, Adeleke University, Ede, Osun State, Nigeria
- Genomics Unit, Helix Biogen Consult, Ogbomoso, Oyo state, Nigeria
| | - Ayodeji Folorunsho Ajayi
- Reproduction and Bioinformatics Unit, Department of Medical Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Olumuyiwa Elijah Ariyo
- Department of Medicine, Infectious Diseases and Tropical Medicine Unit, Federal Teaching Hospital, Ido-Ekiti, Ekiti State, Nigeria
| | | | - Esther Moradeyo Jimah
- Department of Medical Microbiology and Parasitology, University of Ilorin, Kwara State, Nigeria
- Genomics Unit, Helix Biogen Consult, Ogbomoso, Oyo state, Nigeria
| | - Louis Odinakaose Ezediuno
- Department of Microbiology and Parasitology, University of Ilorin, Kwara State, Nigeria
- Genomics Unit, Helix Biogen Consult, Ogbomoso, Oyo state, Nigeria
| | - Oluwadunsin Iyanuoluwa Adebayo
- Reproduction and Bioinformatics Unit, Department of Medical Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
- Genomics Unit, Helix Biogen Consult, Ogbomoso, Oyo state, Nigeria
| | - Emmanuel Tayo Adebayo
- Reproduction and Bioinformatics Unit, Department of Medical Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
- Genomics Unit, Helix Biogen Consult, Ogbomoso, Oyo state, Nigeria
| | - Aduragbemi Noah Odeyemi
- Reproduction and Bioinformatics Unit, Department of Medical Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
- Genomics Unit, Helix Biogen Consult, Ogbomoso, Oyo state, Nigeria
| | - Marvellous Oluwaseun Oyeleke
- Reproduction and Bioinformatics Unit, Department of Medical Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | | | | | - Olawumi Elizabeth Akindiya
- Microbiology Programme, Department of Biological Science, Olusegun Agagu University of Science and Technology, Okitipupa, Ondo State, Nigeria
| | | | - Victoria Oyetayo Aremu
- Reproduction and Bioinformatics Unit, Department of Medical Physiology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
- Genomics Unit, Helix Biogen Consult, Ogbomoso, Oyo state, Nigeria
| | - Abiola O Arowosaye
- Department of Virology, University of Ibadan, Ibadan, Oyo State, Nigeria
| | | | | | | | - Elukunbi Hilda Awoyelu
- Department of Natural Sciences, Precious Conerstone University, Ibadan, Oyo State, Nigeria
| | - Adio Abayomi Oladipo
- Department of Haematology and Blood Grouping Serology, Obafemi Awolowo Teaching Hospital Complex, Ile-Ife Wesley Guild Hospital Wing, Osun State, Nigeria
| | - Bukola Bisola Oladipo
- Department of Clinical Nursing, Bowen University Teaching Hospital, Ogbomoso, Oyo State, Nigeria
| | | | - Olusola Nathaniel Majolagbe
- Department of Pure and Applied Biology, Ladoke Akintola University of Technology, Ogbomoso, Oyo State, Nigeria
| | - Olubukola Monisola Oyawoye
- Department of Microbiology, Laboratory of Molecular Biology, Immunology and Bioinformatics, Adeleke University, Ede, Osun State, Nigeria
| | - Julius Kola Oloke
- Department of Natural Sciences, Precious Conerstone University, Ibadan, Oyo State, Nigeria
| |
Collapse
|
5
|
Zeng R, Liao M. Developing a Multi-Layer Deep Learning Based Predictive Model to Identify DNA N4-Methylcytosine Modifications. Front Bioeng Biotechnol 2020; 8:274. [PMID: 32373597 PMCID: PMC7186498 DOI: 10.3389/fbioe.2020.00274] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/16/2020] [Indexed: 12/21/2022] Open
Abstract
DNA N4-methylcytosine modification (4mC) plays an essential role in a variety of biological processes. Therefore, accurate identification the 4mC distribution in genome-scale is important for systematically understanding its biological functions. In this study, we present Deep4mcPred, a multi-layer deep learning based predictive model to identify DNA N4-methylcytosine modifications. In this predictor, we for the first time integrate residual network and recurrent neural network to build a multi-layer deep learning predictive system. As compared to existing predictors using traditional machine learning, our proposed method has two advantages. First, our deep learning framework does not need to specify the features when training the predictive model. It can automatically learn the high-level features and capture the characteristic specificity of 4mC sites, benefiting to distinguish true 4mC sites from non-4mC sites. On the other hand, our deep learning method outperforms the traditional machine learning predictors in performance by benchmarking comparison, demonstrating that the proposed Deep4mcPred is more effective in the DNA 4mC site prediction. Moreover, via experimental comparison, we found that attention mechanism introduced into the deep learning framework is useful to capture the critical features. Additionally, we develop a webserver implementing the proposed method for the academic use of research community, which is now available at http://server.malab.cn/Deep4mcPred.
Collapse
Affiliation(s)
- Rao Zeng
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| | - Minghong Liao
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
6
|
Zhang W, Jing K, Huang F, Chen Y, Li B, Li J, Gong J. SFLLN: A sparse feature learning ensemble method with linear neighborhood regularization for predicting drug–drug interactions. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.05.017] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
7
|
Gao YC, Zhou XH, Zhang W. An Ensemble Strategy to Predict Prognosis in Ovarian Cancer Based on Gene Modules. Front Genet 2019; 10:366. [PMID: 31068972 PMCID: PMC6491874 DOI: 10.3389/fgene.2019.00366] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 04/05/2019] [Indexed: 12/15/2022] Open
Abstract
Due to the high heterogeneity and complexity of cancer, it is still a challenge to predict the prognosis of cancer patients. In this work, we used a clustering algorithm to divide patients into different subtypes in order to reduce the heterogeneity of the cancer patients in each subtype. Based on the hypothesis that the gene co-expression network may reveal relationships among genes, some communities in the network could influence the prognosis of cancer patients and all the prognosis-related communities could fully reveal the prognosis of cancer patients. To predict the prognosis for cancer patients in each subtype, we adopted an ensemble classifier based on the gene co-expression network of the corresponding subtype. Using the gene expression data of ovarian cancer patients in TCGA (The Cancer Genome Atlas), three subtypes were identified. Survival analysis showed that patients in different subtypes had different survival risks. Three ensemble classifiers were constructed for each subtype. Leave-one-out and independent validation showed that our method outperformed control and literature methods. Furthermore, the function annotation of the communities in each subtype showed that some communities were cancer-related. Finally, we found that the current drug targets can partially support our method.
Collapse
Affiliation(s)
| | - Xiong-Hui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Wen Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
8
|
Tang G, Shi J, Wu W, Yue X, Zhang W. Sequence-based bacterial small RNAs prediction using ensemble learning strategies. BMC Bioinformatics 2018; 19:503. [PMID: 30577759 PMCID: PMC6302447 DOI: 10.1186/s12859-018-2535-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Bacterial small non-coding RNAs (sRNAs) have emerged as important elements in diverse physiological processes, including growth, development, cell proliferation, differentiation, metabolic reactions and carbon metabolism, and attract great attention. Accurate prediction of sRNAs is important and challenging, and helps to explore functions and mechanism of sRNAs. Results In this paper, we utilize a variety of sRNA sequence-derived features to develop ensemble learning methods for the sRNA prediction. First, we compile a balanced dataset and four imbalanced datasets. Then, we investigate various sRNA sequence-derived features, such as spectrum profile, mismatch profile, reverse compliment k-mer and pseudo nucleotide composition. Finally, we consider two ensemble learning strategies to integrate all features for building ensemble learning models for the sRNA prediction. One is the weighted average ensemble method (WAEM), which uses the linear weighted sum of outputs from the individual feature-based predictors to predict sRNAs. The other is the neural network ensemble method (NNEM), which trains a deep neural network by combining diverse features. In the computational experiments, we evaluate our methods on these five datasets by using 5-fold cross validation. WAEM and NNEM can produce better results than existing state-of-the-art sRNA prediction methods. Conclusions WAEM and NNEM have great potential for the sRNA prediction, and are helpful for understanding the biological mechanism of bacteria.
Collapse
Affiliation(s)
- Guifeng Tang
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Jingwen Shi
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
| | - Wenjian Wu
- Electronic Information School, Wuhan University, Wuhan, 430072, China
| | - Xiang Yue
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
9
|
Zhang W, Zhu X, Fu Y, Tsuji J, Weng Z. Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods. BMC Bioinformatics 2017; 18:464. [PMID: 29219070 PMCID: PMC5773893 DOI: 10.1186/s12859-017-1875-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Background Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints. Results Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method. Conclusions In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer, Wuhan University, Wuhan, 430072, China.
| | - Xiaopeng Zhu
- School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
| | - Yu Fu
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01605, USA
| | - Junko Tsuji
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01605, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01605, USA
| |
Collapse
|
10
|
McCarthy BA, Yancopoulos S, Tipping M, Yan XJ, Wang XP, Bennett F, Li W, Lesser M, Paul S, Boyle E, Moreno C, Catera R, Messmer BT, Cutrona G, Ferrarini M, Kolitz JE, Allen SL, Rai KR, Rawstron AC, Chiorazzi N. A seven-gene expression panel distinguishing clonal expansions of pre-leukemic and chronic lymphocytic leukemia B cells from normal B lymphocytes. Immunol Res 2016; 63:90-100. [PMID: 26318878 DOI: 10.1007/s12026-015-8688-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Chronic lymphocytic leukemia (CLL) is a clonal disease of B lymphocytes manifesting as an absolute lymphocytosis in the blood. However, not all lymphocytoses are leukemic. In addition, first-degree relatives of CLL patients have an ~15 % chance of developing a precursor condition to CLL termed monoclonal B cell lymphocytosis (MBL), and distinguishing CLL and MBL B lymphocytes from normal B cell expansions can be a challenge. Therefore, we selected FMOD, CKAP4, PIK3C2B, LEF1, PFTK1, BCL-2, and GPM6a from a set of genes significantly differentially expressed in microarray analyses that compared CLL cells with normal B lymphocytes and used these to determine whether we could discriminate CLL and MBL cells from B cells of healthy controls. Analysis with receiver operating characteristics and Bayesian relevance determination demonstrated good concordance with all panel genes. Using a random forest classifier, the seven-gene panel reliably distinguished normal polyclonal B cell populations from expression patterns occurring in pre-CLL and CLL B cell populations with an error rate of 2 %. Using Bayesian learning, the expression levels of only two genes, FMOD and PIK3C2B, correctly distinguished 100 % of CLL and MBL cases from normal polyclonal and mono/oligoclonal B lymphocytes. Thus, this study sets forth effective computational approaches that distinguish MBL/CLL from normal B lymphocytes. The findings also support the concept that MBL is a CLL precursor.
Collapse
Affiliation(s)
- Brian A McCarthy
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | | | | | - Xiao-Jie Yan
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | - Xue Ping Wang
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | - Fiona Bennett
- Haematological Malignancy Diagnostic Service, Leeds Teaching Hospitals, Leeds, LS2 9JT, UK
| | - Wentian Li
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | - Martin Lesser
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | - Santanu Paul
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | - Erin Boyle
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | - Carolina Moreno
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | - Rosa Catera
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA
| | - Bradley T Messmer
- Moores Cancer Center, University of California, San Diego, San Diego, CA, 92093, USA
| | - Giovanna Cutrona
- U.O. Molecular Pathology, IRCCS Azienda Ospedaliera Universitaria San Martino - Istituto Nazionale per la Ricerca sul Cancro, Genoa, Italy
| | - Manlio Ferrarini
- IRCCS Azienda Ospedaliera Universitaria San Martino - Istituto Nazionale per la Ricerca sul Cancro, Genoa, Italy
| | - Jonathan E Kolitz
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA.,Departments of Molecular Medicine and Medicine, Hofstra North Shore-LIJ School of Medicine, Hempstead, NY, 11549-1000, USA
| | - Steven L Allen
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA.,Departments of Molecular Medicine and Medicine, Hofstra North Shore-LIJ School of Medicine, Hempstead, NY, 11549-1000, USA
| | - Kanti R Rai
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA.,Departments of Molecular Medicine and Medicine, Hofstra North Shore-LIJ School of Medicine, Hempstead, NY, 11549-1000, USA
| | - Andrew C Rawstron
- Haematological Malignancy Diagnostic Service, Leeds Teaching Hospitals, Leeds, LS2 9JT, UK
| | - Nicholas Chiorazzi
- The Feinstein Institute for Medical Research, Manhasset, NY, 11030, USA. .,Departments of Molecular Medicine and Medicine, Hofstra North Shore-LIJ School of Medicine, Hempstead, NY, 11549-1000, USA.
| |
Collapse
|
11
|
Luo F, Gao Y, Zhu Y, Liu J. Integrating peptides' sequence and energy of contact residues information improves prediction of peptide and HLA-I binding with unknown alleles. BMC Bioinformatics 2013; 14 Suppl 8:S1. [PMID: 23815611 PMCID: PMC3654895 DOI: 10.1186/1471-2105-14-s8-s1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background The HLA (human leukocyte antigen) class I is a kind of molecule encoded by a large family of genes and is characteristic of high polymorphism. Now the number of the registered HLA-I molecules has exceeded 3000. Slight differences in the amino acid sequences of HLAs would make them bind to different sets of peptides. In the past decades, although many methods have been proposed to predict the binding between peptides and HLA-I molecules and achieved good performance, most experimental data used by them is limited to the HLAs with a small number of alleles. Thus they are inclined to obtain high prediction accuracy only for data with similar alleles. Because the peptides and HLAs together determine the binding, it's necessary to consider their contribution meanwhile. Results By taking into account the features of the peptides sequence and the energy of contact residues, in this paper a method based on the artificial neural network is proposed to predict the binding of peptides and HLA-I even when the HLAs' potential alleles are unknown. Two experiments in the allele-specific and super-type cases are performed respectively to validate our method. In the first case, we collect 14 HLA-A and 14 HLA-B molecules on Bjoern Peters dataset, and compare our method with the ARB, SMM, NetMHC and other 16 online methods. Our method gets the best average AUC (Area under the ROC) value as 0.909. In the second one, we use leave one out cross validation on MHC-peptide binding data that has different alleles but shares the common super-type. Compared to gold standard methods like NetMHC and NetMHCpan, our method again achieves the best average AUC value as 0.847. Conclusions Our method achieves satisfactory results. Whenever it's tested on the HLA-I with single definite gene or with super-type gene locus, it gets better classification accuracy. Especially, when the training set is small, our method still works better than the other methods in the comparison. Therefore, we could make a conclusion that by combining the peptides' information, HLAs amino acid residues' interaction information and contact energy, our method really could improve prediction of the peptide HLA-I binding even when there aren't the prior experimental dataset for HLAs with various alleles.
Collapse
Affiliation(s)
- Fei Luo
- School of Computer, Wuhan University, Wuhan, Hubei, China
| | | | | | | |
Collapse
|
12
|
Liao WWP, Arthur JW. Predicting peptide binding to Major Histocompatibility Complex molecules. Autoimmun Rev 2011; 10:469-73. [PMID: 21333759 DOI: 10.1016/j.autrev.2011.02.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2011] [Accepted: 02/09/2011] [Indexed: 12/29/2022]
Abstract
The Major Histocompatibility Complex (MHC) constitutes an important part of the human immune system. During infection, pathogenic proteins are processed into peptide fragments by the antigen processing machinery. These peptides bind to MHC molecules and the MHC-peptide complex is then transported to the cell membrane where it elicits an immune response via T-cell binding. Understanding the molecular mechanism of this process will greatly assist in determining the aetiology of various diseases and in the design of effective drugs. One of the most challenging aspects of this area of research is understanding the specificity and sensitivity of the binding process. An empirical approach to the problem is unfeasible as there are over 512 billion potential binding peptides for each MHC molecule. Computational approaches offer the promise of predicting peptide binding, thus dramatically reducing the number of peptides proceeding to experimental verification. Various bioinformatic approaches have been developed to predict whether or not a particular peptide will bind to a particular MHC allele. Currently, peptide binding prediction methods can be categorised into three major groups: motif- and scoring matrix-based methods, artificial intelligence- (AI-) based methods, and structure-based methods. The first two are sequence-based approaches and are generally based on common sequence motifs in peptides known to bind to MHC molecules. The structure-based approach concerns the structural features and the distribution of energy between the binding peptide and the MHC molecule. Although knowledge of the molecular structure of the MHC molecules is expected to lead to better predictions of peptide binding, the development of structure-based methods has been relatively slow compared to sequence-based methods. Comparisons of various methods showed that the best sequence-based methods significantly outperform structure-based methods. This may be improved by producing more structures and binding data desperately needed by many alleles, especially class II molecules. On the other hand, the large number of verification methods and indicators used by structure-based studies hinders critical evaluation of the methods. Adopting commonly used assessment procedures can demonstrate the relative performance of structure-based methods in a straightforward comparison with other methods. This review provides an overview of current methods for predicting peptide binding to the MHC, with a focus on structure-based methods, and explores the potential for future development in this area.
Collapse
Affiliation(s)
- Webber W P Liao
- Discipline of Medicine, Central Clinical School, University of Sydney, NSW, 2006, Australia
| | | |
Collapse
|
13
|
Zhang W, Liu J, Niu Y. Quantitative prediction of MHC-II binding affinity using particle swarm optimization. Artif Intell Med 2010; 50:127-32. [PMID: 20541921 DOI: 10.1016/j.artmed.2010.05.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2009] [Revised: 03/31/2010] [Accepted: 05/12/2010] [Indexed: 01/13/2023]
Abstract
OBJECTIVE Helper T-cell epitopes (Th epitopes) are the basic units which activate helper T-cell's immune response, and they are helpful for understanding the immune mechanism and developing vaccines. Peptide and major histocompatibility complex class II (MHC-II) binding is an important prerequisite event for helper T-cell immune response, and the binding peptides are usually recognized as Th epitopes, therefore we can identify Th epitopes by predicting MHC-II binding peptides. Recently, instead of differentiating the peptides as binder or non-binder, researchers are more interested in predicting binding affinities between MHC-II molecules and peptides. METHODOLOGY Motivated by the collective search strategy of the particle swarm optimization algorithm (PSO), a method was developed to make the direct prediction of peptide binding affinity. In our paper, PSO was utilized to search for the optimal position-specific scoring matrices (PSSM) from the experimentally derived allele-related peptides, and then the prediction models were constructed based on the matrices. Moreover, we evaluated several factors influencing the binding affinity, including peptide length and flanking residue length, and incorporated them into our models. RESULTS The performance of our models was evaluated on three MHC-II alleles from AntiJen database and 14 MHC-II alleles from IEDB database. When compared to the existing popular quantitative methods such as MHCPred, SVRMHC, ARB and SMM-align, our method can give out better performance in terms of correlation coefficient (r) and area under ROC curve (AUC). In addition, the results demonstrated that the performance of models was further improved by incorporating the global length information, achieving average AUC value of 0.7534 and average r value of 0.4707. CONCLUSIONS Quantitative prediction of MHC-II binding affinity can be modeled as an optimization problem. Our PSO based method can find the optimal PSSM, which will then be used for identifying the binding cores and scoring the binding affinities of the peptides. The experiment results show that our method is promising for the prediction of MHC-II binding affinity.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer Science, Wuhan University, Wuhan 430072, People's Republic of China.
| | | | | |
Collapse
|
14
|
Weaver JM, Sant AJ. Understanding the focused CD4 T cell response to antigen and pathogenic organisms. Immunol Res 2009; 45:123-43. [PMID: 19198764 DOI: 10.1007/s12026-009-8095-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Immunodominance is a term that reflects the final, very limited peptide specificity of T cells that are elicited during an immune response. Recent experiments in our laboratory compel us to propose a new paradigm for the control of immunodominance in CD4 T cell responses, stating that immunodominance is peptide-intrinsic and is dictated by the off-rate of peptides from MHC class II molecules. Our studies have revealed that persistence of peptide:class II complexes both predicts and controls CD4 T cell immunodominance and that this parameter can be rationally manipulated to either promote or eliminate immune responses. Mechanistically, we have determined that DM editing in APC is a key event that is influenced by the kinetic stability of class II:peptide complexes and that differential persistence of complexes also impacts the expansion phase of the immune response. These studies have important implications for rational vaccine design and for understanding the immunological mechanisms that limit the specificity of CD4 T cell responses.
Collapse
Affiliation(s)
- Jason M Weaver
- David H. Smith Center for Vaccine Biology and Immunology, AaB Institute of Biomedical Sciences, Department of Microbiology and Immunology, University of Rochester, NY 14642, USA
| | | |
Collapse
|
15
|
Lin HH, Zhang GL, Tongchusak S, Reinherz EL, Brusic V. Evaluation of MHC-II peptide binding prediction servers: applications for vaccine research. BMC Bioinformatics 2008; 9 Suppl 12:S22. [PMID: 19091022 PMCID: PMC2638162 DOI: 10.1186/1471-2105-9-s12-s22] [Citation(s) in RCA: 158] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Initiation and regulation of immune responses in humans involves recognition of peptides presented by human leukocyte antigen class II (HLA-II) molecules. These peptides (HLA-II T-cell epitopes) are increasingly important as research targets for the development of vaccines and immunotherapies. HLA-II peptide binding studies involve multiple overlapping peptides spanning individual antigens, as well as complete viral proteomes. Antigen variation in pathogens and tumor antigens, and extensive polymorphism of HLA molecules increase the number of targets for screening studies. Experimental screening methods are expensive and time consuming and reagents are not readily available for many of the HLA class II molecules. Computational prediction methods complement experimental studies, minimize the number of validation experiments, and significantly speed up the epitope mapping process. We collected test data from four independent studies that involved 721 peptide binding assays. Full overlapping studies of four antigens identified binding affinity of 103 peptides to seven common HLA-DR molecules (DRB1*0101, 0301, 0401, 0701, 1101, 1301, and 1501). We used these data to analyze performance of 21 HLA-II binding prediction servers accessible through the WWW. RESULTS Because not all servers have predictors for all tested HLA-II molecules, we assessed a total of 113 predictors. The length of test peptides ranged from 15 to 19 amino acids. We tried three prediction strategies - the best 9-mer within the longer peptide, the average of best three 9-mer predictions, and the average of all 9-mer predictions within the longer peptide. The best strategy was the identification of a single best 9-mer within the longer peptide. Overall, measured by the receiver operating characteristic method (AROC), 17 predictors showed good (AROC > 0.8), 41 showed marginal (AROC > 0.7), and 55 showed poor performance (AROC < 0.7). Good performance predictors included HLA-DRB1*0101 (seven), 1101 (six), 0401 (three), and 0701 (one). The best individual predictor was NETMHCIIPAN, closely followed by PROPRED, IEDB (Consensus), and MULTIPRED (SVM). None of the individual predictors was shown to be suitable for prediction of promiscuous peptides. Current predictive capabilities allow prediction of only 50% of actual T-cell epitopes using practical thresholds. CONCLUSION The available HLA-II servers do not match prediction capabilities of HLA-I predictors. Currently available HLA-II prediction servers offer only a limited prediction accuracy and the development of improved predictors is needed for large-scale studies, such as proteome-wide epitope mapping. The requirements for accuracy of HLA-II binding predictions are stringent because of the substantial effect of false positives.
Collapse
Affiliation(s)
- Hong Huang Lin
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | | | | | | | | |
Collapse
|