1
|
Quddusi DM, Hiremath SA, Bajcinca N. Mutation prediction in the SARS-CoV-2 genome using attention-based neural machine translation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:5996-6018. [PMID: 38872567 DOI: 10.3934/mbe.2024264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) has been evolving rapidly after causing havoc worldwide in 2020. Since then, it has been very hard to contain the virus owing to its frequently mutating nature. Changes in its genome lead to viral evolution, rendering it more resistant to existing vaccines and drugs. Predicting viral mutations beforehand will help in gearing up against more infectious and virulent versions of the virus in turn decreasing the damage caused by them. In this paper, we have proposed different NMT (neural machine translation) architectures based on RNNs (recurrent neural networks) to predict mutations in the SARS-CoV-2-selected non-structural proteins (NSP), i.e., NSP1, NSP3, NSP5, NSP8, NSP9, NSP13, and NSP15. First, we created and pre-processed the pairs of sequences from two languages using k-means clustering and nearest neighbors for training a neural translation machine. We also provided insights for training NMTs on long biological sequences. In addition, we evaluated and benchmarked our models to demonstrate their efficiency and reliability.
Collapse
Affiliation(s)
- Darrak Moin Quddusi
- Chair of Mechatronics in the Faculty of Mechanical and Process Engineering, Rheinland-Pfalz Technical University of Kaiserslautern-Landau, Kaiserslautern 67663, Germany
| | - Sandesh Athni Hiremath
- Chair of Mechatronics in the Faculty of Mechanical and Process Engineering, Rheinland-Pfalz Technical University of Kaiserslautern-Landau, Kaiserslautern 67663, Germany
| | - Naim Bajcinca
- Chair of Mechatronics in the Faculty of Mechanical and Process Engineering, Rheinland-Pfalz Technical University of Kaiserslautern-Landau, Kaiserslautern 67663, Germany
| |
Collapse
|
2
|
Yin R, Gutierrez A, Kobren SN, Avillach P. VarPPUD: Variant post prioritization developed for undiagnosed genetic disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305876. [PMID: 38699371 PMCID: PMC11065012 DOI: 10.1101/2024.04.15.24305876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Rare and ultra-rare genetic conditions are estimated to impact nearly 1 in 17 people worldwide, yet accurately pinpointing the diagnostic variants underlying each of these conditions remains a formidable challenge. Because comprehensive, in vivo functional assessment of all possible genetic variants is infeasible, clinicians instead consider in silico variant pathogenicity predictions to distinguish plausibly disease-causing from benign variants across the genome. However, in the most difficult undiagnosed cases, such as those accepted to the Undiagnosed Diseases Network (UDN), existing pathogenicity predictions cannot reliably discern true etiological variant(s) from other deleterious candidate variants that were prioritized through N-of-1 efforts. Pinpointing the disease-causing variant from a pool of plausible candidates remains a largely manual effort requiring extensive clinical workups, functional and experimental assays, and eventual identification of genotype- and phenotype-matched individuals. Here, we introduce VarPPUD, a tool trained on prioritized variants from UDN cases, that leverages gene-, amino acid-, and nucleotide-level features to discern pathogenic variants from other deleterious variants that are unlikely to be confirmed as disease relevant. VarPPUD achieves a cross-validated accuracy of 79.3% and precision of 77.5% on a held-out subset of uniquely challenging UDN cases, respectively representing an average 18.6% and 23.4% improvement over nine traditional pathogenicity prediction approaches on this task. We validate VarPPUD's ability to discriminate likely from unlikely pathogenic variants on synthetic, GAN-generated candidate variants as well. Finally, we show how VarPPUD can be probed to evaluate each input feature's importance and contribution toward prediction-an essential step toward understanding the distinct characteristics of newly-uncovered disease-causing variants. Significance Statement Patients with chronic, undiagnosed and underdiagnosed genetic conditions often endure expensive and excruciating years-long diagnostic odysseys without clear results. In many instances, clinical genome sequencing of patients and their family members fails to reveal known disease-causing variants, although compelling variants of uncertain significance are frequently encountered. Existing computational tools struggle to reliably differentiate truly disease-causing variants from other plausible candidate variants within these prioritized sets. Consequently, the confirmation of disease-causing variants often necessitates extensive experimental follow-up, including studies in model organisms and identification of other similarly presenting genotype-matched individuals, a process that can extend for several years. Here, we present VarPPUD, a tool trained specifically to distinguish likely from unlikely to be confirmed pathogenic variants that were prioritized across cases in the Undiagnosed Diseases Network. By evaluating the importance and impact of different input feature values on prediction, we gain deeper insights into the distinctive attributes of difficult-to-identify diagnostic variants. For patients who remain undiagnosed following comprehensive whole genome sequencing, our new method VarPPUD may reveal pathogenic variants amid a pool of candidate variants, thereby advancing diagnostic efforts where progress has otherwise stalled.
Collapse
|
3
|
Saha G, Sawmya S, Saha A, Akil MA, Tasnim S, Rahman MS, Rahman MS. PRIEST: predicting viral mutations with immune escape capability of SARS-CoV-2 using temporal evolutionary information. Brief Bioinform 2024; 25:bbae218. [PMID: 38742520 PMCID: PMC11091746 DOI: 10.1093/bib/bbae218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 04/04/2024] [Accepted: 04/06/2024] [Indexed: 05/16/2024] Open
Abstract
The dynamic evolution of the severe acute respiratory syndrome coronavirus 2 virus is primarily driven by mutations in its genetic sequence, culminating in the emergence of variants with increased capability to evade host immune responses. Accurate prediction of such mutations is fundamental in mitigating pandemic spread and developing effective control measures. This study introduces a robust and interpretable deep-learning approach called PRIEST. This innovative model leverages time-series viral sequences to foresee potential viral mutations. Our comprehensive experimental evaluations underscore PRIEST's proficiency in accurately predicting immune-evading mutations. Our work represents a substantial step in utilizing deep-learning methodologies for anticipatory viral mutation analysis and pandemic response.
Collapse
Affiliation(s)
- Gourab Saha
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Shashata Sawmya
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Arpita Saha
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md Ajwad Akil
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Sadia Tasnim
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md Saifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| |
Collapse
|
4
|
Garjani A, Chegini AM, Salehi M, Tabibzadeh A, Yousefi P, Razizadeh MH, Esghaei M, Esghaei M, Rohban MH. Forecasting influenza hemagglutinin mutations through the lens of anomaly detection. Sci Rep 2023; 13:14944. [PMID: 37696867 PMCID: PMC10495359 DOI: 10.1038/s41598-023-42089-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 09/05/2023] [Indexed: 09/13/2023] Open
Abstract
The influenza virus hemagglutinin is an important part of the virus attachment to the host cells. The hemagglutinin proteins are one of the genetic regions of the virus with a high potential for mutations. Due to the importance of predicting mutations in producing effective and low-cost vaccines, solutions that attempt to approach this problem have recently gained significant attention. A historical record of mutations has been used to train predictive models in such solutions. However, the imbalance between mutations and preserved proteins is a big challenge for the development of such models that need to be addressed. Here, we propose to tackle this challenge through anomaly detection (AD). AD is a well-established field in Machine Learning (ML) that tries to distinguish unseen anomalies from normal patterns using only normal training samples. By considering mutations as anomalous behavior, we could benefit existing rich solutions in this field that have emerged recently. Such methods also fit the problem setup of extreme imbalance between the number of unmutated vs. mutated training samples. Motivated by this formulation, our method tries to find a compact representation for unmutated samples while forcing anomalies to be separated from the normal ones. This helps the model to learn a shared unique representation between normal training samples as much as possible, which improves the discernibility and detectability of mutated samples from the unmutated ones at the test time. We conduct a large number of experiments on four publicly available datasets, consisting of three different hemagglutinin protein datasets, and one SARS-CoV-2 dataset, and show the effectiveness of our method through different standard criteria.
Collapse
Affiliation(s)
- Ali Garjani
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | | | - Mohammadreza Salehi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Alireza Tabibzadeh
- Department of Virology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Parastoo Yousefi
- Department of Virology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | | | - Moein Esghaei
- Cognitive Neuroscience Laboratory, German Primate Center, Leibniz Institute for Primate Research, Goettingen, Germany
| | - Maryam Esghaei
- Department of Virology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | | |
Collapse
|
5
|
Ding P, Zeng M, Yin R. Editorial: Computational methods to analyze RNA data for human diseases. Front Genet 2023; 14:1270334. [PMID: 37674479 PMCID: PMC10478215 DOI: 10.3389/fgene.2023.1270334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 08/14/2023] [Indexed: 09/08/2023] Open
Affiliation(s)
- Pingjian Ding
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, United States
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Rui Yin
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| |
Collapse
|
6
|
Peng F, Xia Y, Li W. Prediction of Antigenic Distance in Influenza A Using Attribute Network Embedding. Viruses 2023; 15:1478. [PMID: 37515165 PMCID: PMC10385503 DOI: 10.3390/v15071478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/23/2023] [Accepted: 06/28/2023] [Indexed: 07/30/2023] Open
Abstract
Owing to the rapid changes in the antigenicity of influenza viruses, it is difficult for humans to obtain lasting immunity through antiviral therapy. Hence, tracking the dynamic changes in the antigenicity of influenza viruses can provide a basis for vaccines and drug treatments to cope with the spread of influenza viruses. In this paper, we developed a novel quantitative prediction method to predict the antigenic distance between virus strains using attribute network embedding techniques. An antigenic network is built to model and combine the genetic and antigenic characteristics of the influenza A virus H3N2, using the continuous distributed representation of the virus strain protein sequence (ProtVec) as a node attribute and the antigenic distance between virus strains as an edge weight. The results show a strong positive correlation between supplementing genetic features and antigenic distance prediction accuracy. Further analysis indicates that our prediction model can comprehensively and accurately track the differences in antigenic distances between vaccines and influenza virus strains, and it outperforms existing methods in predicting antigenic distances between strains.
Collapse
Affiliation(s)
- Fujun Peng
- School of Information Science and Engineering, Yunnan University, Kunming 650500, China
| | - Yuanling Xia
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming 650500, China
| | - Weihua Li
- School of Information Science and Engineering, Yunnan University, Kunming 650500, China
| |
Collapse
|
7
|
Yin R, Luo Z, Zhuang P, Zeng M, Li M, Lin Z, Kwoh CK. ViPal: A framework for virulence prediction of influenza viruses with prior viral knowledge using genomic sequences. J Biomed Inform 2023; 142:104388. [PMID: 37178781 PMCID: PMC10602211 DOI: 10.1016/j.jbi.2023.104388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 04/30/2023] [Accepted: 05/07/2023] [Indexed: 05/15/2023]
Abstract
Influenza viruses pose great threats to public health and cause enormous economic losses every year. Previous work has revealed the viral factors associated with the virulence of influenza viruses in mammals. However, taking prior viral knowledge represented by heterogeneous categorical and discrete information into account to explore virus virulence is scarce in the existing work. How to make full use of the preceding domain knowledge in virulence study is challenging but beneficial. This paper proposes a general framework named ViPal for virulence prediction in mice that incorporates discrete prior viral mutation and reassortment information based on all eight influenza segments. The posterior regularization technique is leveraged to transform prior viral knowledge into constraint features and integrated into the machine learning models. Experimental results on influenza genomic datasets validate that our proposed framework can improve virulence prediction performance over baselines. The comparison between ViPal and other existing methods shows the computational efficiency of our framework with comparable or superior performance. Moreover, the interpretable analysis through SHAP (SHapley Additive exPlanations) identifies the scores of constraint features contributing to the prediction. We hope this framework could provide assistance for the accurate detection of influenza virulence and facilitate flu surveillance.
Collapse
Affiliation(s)
- Rui Yin
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, USA; School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore.
| | - Zihan Luo
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
| | - Pei Zhuang
- Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhuoyi Lin
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| |
Collapse
|
8
|
Zhang J, Zhou P, Zheng Y, Wu H. Predicting influenza with pandemic-awareness via Dynamic Virtual Graph Significance Networks. Comput Biol Med 2023; 158:106807. [PMID: 37001208 DOI: 10.1016/j.compbiomed.2023.106807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/20/2023] [Accepted: 03/20/2023] [Indexed: 03/30/2023]
Abstract
Every year, influenza spreads worldwide and burdens people's health substantially. We need a reliable model to help hospitals, pharmaceutical companies, and governments better prepare for influenza outbreaks in a timely manner. However, the domain knowledge for such public health events, such as the variable influenza seasonality and occasional pandemics, poses significant challenges in predicting influenza outbreaks. The existing methods use current and historical values in a user-defined time window as input to predict future values but lack considering the situations outside the window. To address these limitations, we proposed Dynamic Virtual Graph Significance Networks (DVGSN). The graph-based algorithm can supervisedly and dynamically learn the implied knowledge from similar "infection situations" in all the historical timepoints without the limitation of time window. Furthermore, representation learning on the dynamic virtual graph can tackle the varied seasonality with pandemic-awareness without requiring domain knowledge input. The extensive experiments on real-world influenza data demonstrate that DVGSN significantly outperforms the state-of-the-art methods. To the best of our knowledge, this is the first attempt to supervisedly learn a dynamic virtual graph for time-series prediction tasks. Moreover, the proposed method has rich interpretabilities, which makes the method more acceptable in the fields of public health, life sciences, and so on. Our source code and dataset are available at https://github.com/aI-area/DVGSN.
Collapse
|
9
|
Li M, Zhao B, Yin R, Lu C, Guo F, Zeng M. GraphLncLoc: long non-coding RNA subcellular localization prediction using graph convolutional networks based on sequence to graph transformation. Brief Bioinform 2023; 24:6955268. [PMID: 36545797 DOI: 10.1093/bib/bbac565] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 11/04/2022] [Accepted: 11/20/2022] [Indexed: 12/24/2022] Open
Abstract
The subcellular localization of long non-coding RNAs (lncRNAs) is crucial for understanding lncRNA functions. Most of existing lncRNA subcellular localization prediction methods use k-mer frequency features to encode lncRNA sequences. However, k-mer frequency features lose sequence order information and fail to capture sequence patterns and motifs of different lengths. In this paper, we proposed GraphLncLoc, a graph convolutional network-based deep learning model, for predicting lncRNA subcellular localization. Unlike previous studies encoding lncRNA sequences by using k-mer frequency features, GraphLncLoc transforms lncRNA sequences into de Bruijn graphs, which transforms the sequence classification problem into a graph classification problem. To extract the high-level features from the de Bruijn graph, GraphLncLoc employs graph convolutional networks to learn latent representations. Then, the high-level feature vectors derived from de Bruijn graph are fed into a fully connected layer to perform the prediction task. Extensive experiments show that GraphLncLoc achieves better performance than traditional machine learning models and existing predictors. In addition, our analyses show that transforming sequences into graphs has more distinguishable features and is more robust than k-mer frequency features. The case study shows that GraphLncLoc can uncover important motifs for nucleus subcellular localization. GraphLncLoc web server is available at http://csuligroup.com:8000/GraphLncLoc/.
Collapse
Affiliation(s)
- Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Baoying Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Rui Yin
- Department of Biomedical Informatics, Harvard Medical School, Boston 021382, USA
| | - Chengqian Lu
- School of Computer Science, Key Laboratory of Intelligent Computing and Information Processing, Xiangtan University, Xiangtan, China
| | - Fei Guo
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Zeng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
10
|
Zhou B, Zhou H, Zhang X, Xu X, Chai Y, Zheng Z, Kot AC, Zhou Z. TEMPO: A transformer-based mutation prediction framework for SARS-CoV-2 evolution. Comput Biol Med 2023; 152:106264. [PMID: 36535209 PMCID: PMC9747230 DOI: 10.1016/j.compbiomed.2022.106264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/16/2022] [Accepted: 10/30/2022] [Indexed: 12/15/2022]
Abstract
The widespread of SARS-CoV-2 presents a significant threat to human society, as well as public health and economic development. Extensive efforts have been undertaken to battle against the pandemic, whereas effective approaches such as vaccination would be weakened by the continuous mutations, leading to considerable attention being attracted to the mutation prediction. However, most previous studies lack attention to phylogenetics. In this paper, we propose a novel and effective model TEMPO for predicting the mutation of SARS-CoV-2 evolution. Specifically, we design a phylogenetic tree-based sampling method to generate sequence evolution data. Then, a transformer-based model is presented for the site mutation prediction after learning the high-level representation of these sequence data. We conduct experiments to verify the effectiveness of TEMPO, leveraging a large-scale SARS-CoV- 2 dataset. Experimental results show that TEMPO is effective for mutation prediction of SARS- CoV-2 evolution and outperforms several state-of-the-art baseline methods. We further perform mutation prediction experiments of other infectious viruses, to explore the feasibility and robustness of TEMPO, and experimental results verify its superiority. The codes and datasets are freely available at https://github.com/ZJUDataIntelligence/TEMPO.
Collapse
Affiliation(s)
- Binbin Zhou
- Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; Industry Brain Institute, Zhejiang University City College, Hangzhou, 310015, China.
| | - Hang Zhou
- Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China.
| | - Xue Zhang
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
| | - Xiaobin Xu
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
| | - Yi Chai
- ZJU-UoE Institute, Zhejiang University, Haining, 314400, China.
| | - Zengwei Zheng
- Department of Computer Science and Computing, Zhejiang University City College, No. 48 Huzhou Street, Hangzhou, 310015, China; Industry Brain Institute, Zhejiang University City College, Hangzhou, 310015, China.
| | - Alex Chichung Kot
- School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore.
| | - Zhan Zhou
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China; Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 310058, China.
| |
Collapse
|
11
|
Rashid S, Ng TA, Kwoh CK. Jupytope: computational extraction of structural properties of viral epitopes. Brief Bioinform 2022; 23:6696137. [PMID: 36094101 DOI: 10.1093/bib/bbac362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 07/29/2022] [Accepted: 08/02/2022] [Indexed: 12/14/2022] Open
Abstract
Epitope residues located on viral surface proteins are of immense interest in immunology and related applications such as vaccine development, disease diagnosis and drug design. Most tools rely on sequence-based statistical comparisons, such as information entropy of residue positions in aligned columns to infer location and properties of epitope sites. To facilitate cross-structural comparisons of epitopes on viral surface proteins, a python-based extraction tool implemented with Jupyter notebook is presented (Jupytope). Given a viral antigen structure of interest, a list of known epitope sites and a reference structure, the corresponding epitope structural properties can quickly be obtained. The tool integrates biopython modules for commonly used software such as NACCESS, DSSP as well as residue depth and outputs a list of structure-derived properties such as dihedral angles, solvent accessibility, residue depth and secondary structure that can be saved in several convenient data formats. To ensure correct spatial alignment, Jupytope takes a list of given epitope sites and their corresponding reference structure and aligns them before extracting the desired properties. Examples are demonstrated for epitopes of Influenza and severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) viral strains. The extracted properties assist detection of two Influenza subtypes and show potential in distinguishing between four major clades of SARS-CoV2, as compared with randomized labels. The tool will facilitate analytical and predictive works on viral epitopes through the extracted structural information. Jupytope and extracted datasets are available at https://github.com/shamimarashid/Jupytope.
Collapse
Affiliation(s)
- Shamima Rashid
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| | - Teng Ann Ng
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| |
Collapse
|
12
|
Yin R, Thwin NN, Zhuang P, Lin Z, Kwoh CK. IAV-CNN: A 2D Convolutional Neural Network Model to Predict Antigenic Variants of Influenza A Virus. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3497-3506. [PMID: 34469306 DOI: 10.1109/tcbb.2021.3108971] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The rapid evolution of influenza viruses constantly leads to the emergence of novel influenza strains that are capable of escaping from population immunity. The timely determination of antigenic variants is critical to vaccine design. Empirical experimental methods like hemagglutination inhibition (HI) assays are time-consuming and labor-intensive, requiring live viruses. Recently, many computational models have been developed to predict the antigenic variants without considerations of explicitly modeling the interdependencies between the channels of feature maps. Moreover, the influenza sequences consisting of similar distribution of residues will have high degrees of similarity and will affect the prediction outcome. Consequently, it is challenging but vital to determine the importance of different residue sites and enhance the predictive performance of influenza antigenicity. We have proposed a 2D convolutional neural network (CNN) model to infer influenza antigenic variants (IAV-CNN). Specifically, we apply a new distributed representation of amino acids, named ProtVec that can be applied to a variety of downstream proteomic machine learning tasks. After splittings and embeddings of influenza strains, a 2D squeeze-and-excitation CNN architecture is constructed that enables networks to focus on informative residue features by fusing both spatial and channel-wise information with local receptive fields at each layer. Experimental results on three influenza datasets show IAV-CNN achieves state-of-the-art performance combining the new distributed representation with our proposed architecture. It outperforms both traditional machine algorithms with the same feature representations and the majority of existing models in the independent test data. Therefore we believe that our model can be served as a reliable and robust tool for the prediction of antigenic variants.
Collapse
|
13
|
Mao T, Yan D, Zhou M, Qiu T, Cao Z. Possibility of estimating future mutants for influenza: Comparison between previous prediction and subsequent years observation. Front Microbiol 2022; 13:1031672. [PMID: 36274717 PMCID: PMC9581178 DOI: 10.3389/fmicb.2022.1031672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 09/16/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Tiantian Mao
- Department of Gastroenterology, Shanghai Tenth People’s Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Deyu Yan
- Department of Gastroenterology, Shanghai Tenth People’s Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Mengdi Zhou
- Department of Gastroenterology, Shanghai Tenth People’s Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- *Correspondence: Tianyi Qiu,
| | - Zhiwei Cao
- Department of Gastroenterology, Shanghai Tenth People’s Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- School of Life Sciences, Fudan University, Shanghai, China
- Zhiwei Cao,
| |
Collapse
|
14
|
He X, Shi S, Geng X, Xu L. Information-aware attention dynamic synergetic network for multivariate time series long-term forecasting. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
15
|
Yin R, Zhu X, Zeng M, Wu P, Li M, Kwoh CK. A framework for predicting variable-length epitopes of human-adapted viruses using machine learning methods. Brief Bioinform 2022; 23:6645487. [PMID: 35849093 DOI: 10.1093/bib/bbac281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/16/2022] [Accepted: 06/17/2022] [Indexed: 11/14/2022] Open
Abstract
The coronavirus disease 2019 pandemic has alerted people of the threat caused by viruses. Vaccine is the most effective way to prevent the disease from spreading. The interaction between antibodies and antigens will clear the infectious organisms from the host. Identifying B-cell epitopes is critical in vaccine design, development of disease diagnostics and antibody production. However, traditional experimental methods to determine epitopes are time-consuming and expensive, and the predictive performance using the existing in silico methods is not satisfactory. This paper develops a general framework to predict variable-length linear B-cell epitopes specific for human-adapted viruses with machine learning approaches based on Protvec representation of peptides and physicochemical properties of amino acids. QR decomposition is incorporated during the embedding process that enables our models to handle variable-length sequences. Experimental results on large immune epitope datasets validate that our proposed model's performance is superior to the state-of-the-art methods in terms of AUROC (0.827) and AUPR (0.831) on the testing set. Moreover, sequence analysis also provides the results of the viral category for the corresponding predicted epitopes with high precision. Therefore, this framework is shown to reliably identify linear B-cell epitopes of human-adapted viruses given protein sequences and could provide assistance for potential future pandemics and epidemics.
Collapse
Affiliation(s)
- Rui Yin
- Department of Biomedical Informatics, Harvard Medical School, Boston, USA
| | - Xianghe Zhu
- Department of Statistics, University of Oxford, Oxford, UK
| | - Min Zeng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Pengfei Wu
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| |
Collapse
|
16
|
Abbas ME, Chengzhang Z, Fathalla A, Xiao Y. End-to-end antigenic variant generation for H1N1 influenza HA protein using sequence to sequence models. PLoS One 2022; 17:e0266198. [PMID: 35344562 PMCID: PMC8959165 DOI: 10.1371/journal.pone.0266198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 03/16/2022] [Indexed: 11/23/2022] Open
Abstract
The growing risk of new variants of the influenza A virus is the most significant to public health. The risk imposed from new variants may have been lethal, as witnessed in the year 2009. Even though the improvement in predicting antigenicity of influenza viruses has rapidly progressed, few studies employed deep learning methodologies. The most recent literature mostly relied on classification techniques, while a model that generates the HA protein of the antigenic variant is not developed. However, the antigenic pair of influenza virus A can be determined in a laboratory setup, the process needs a tremendous amount of time and labor. Antigenic shift and drift which are caused by changes in surface protein favored the influenza A virus in evading immunity. The high frequency of the minor changes in the surface protein poses a challenge to identifying the antigenic variant of an emerging virus. These changes slow down vaccine selection and the manufacturing process. In this vein, the proposed model could help save the time and efforts exerted to identify the antigenic pair of the influenza virus. The proposed model utilized an end-to-end learning methodology relying on deep sequence-to-sequence architecture to generate the antigenic variant of a given influenza A virus using surface protein. Employing the BLEU score to evaluate the generated HA protein of the antigenic variant of influenza virus A against the actual variant, the proposed model achieved a mean accuracy of 97.57%.
Collapse
Affiliation(s)
- Mohamed Elsayed Abbas
- School of Computer Science and Engineering, Central South University, Changsha, China
- Mobile Health Ministry of Education-China Mobile Joint Laboratory, Changsha, China
- * E-mail:
| | - Zhu Chengzhang
- School of Computer Science and Engineering, Central South University, Changsha, China
- The College of Literature and Journalism, Central South University, Changsha, China
- Mobile Health Ministry of Education-China Mobile Joint Laboratory, Changsha, China
| | - Ahmed Fathalla
- Department of Mathematics, Faculty of Science,Suez Canal University, Ismailia, Egypt
| | - Yalong Xiao
- School of Computer Science and Engineering, Central South University, Changsha, China
- The College of Literature and Journalism, Central South University, Changsha, China
| |
Collapse
|
17
|
Wang H, Zang Y, Zhao Y, Hao D, Kang Y, Zhang J, Zhang Z, Zhang L, Yang Z, Zhang S. Sequence Matching between Hemagglutinin and Neuraminidase through Sequence Analysis Using Machine Learning. Viruses 2022; 14:v14030469. [PMID: 35336876 PMCID: PMC8950662 DOI: 10.3390/v14030469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 02/15/2022] [Accepted: 02/17/2022] [Indexed: 01/27/2023] Open
Abstract
To date, many experiments have revealed that the functional balance between hemagglutinin (HA) and neuraminidase (NA) plays a crucial role in viral mobility, production, and transmission. However, whether and how HA and NA maintain balance at the sequence level needs further investigation. Here, we applied principal component analysis and hierarchical clustering analysis on thousands of HA and NA sequences of A/H1N1 and A/H3N2. We discovered significant coevolution between HA and NA at the sequence level, which is closely related to the type of host species and virus epidemic years. Furthermore, we propose a sequence-to-sequence transformer model (S2STM), which mainly consists of an encoder and a decoder that adopts a multi-head attention mechanism for establishing the mapping relationship between HA and NA sequences. The training results reveal that the S2STM can effectively realize the “translation” from HA to NA or vice versa, thereby building a relationship network between them. Our work combines unsupervised and supervised machine learning methods to identify the sequence matching between HA and NA, which will advance our understanding of IAVs’ evolution and also provide a novel idea for sequence analysis methods.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Zhiwei Yang
- Correspondence: (Z.Y.); (S.Z.); Tel.: +86-029-8266-8634 (Z.Y.); +86-029-8266-0915 (S.Z.)
| | - Shengli Zhang
- Correspondence: (Z.Y.); (S.Z.); Tel.: +86-029-8266-8634 (Z.Y.); +86-029-8266-0915 (S.Z.)
| |
Collapse
|
18
|
Gan SKE, Phua SX, Yeo JY. Sagacious epitope selection for vaccines, and both antibody-based therapeutics and diagnostics: tips from virology and oncology. Antib Ther 2022; 5:63-72. [PMID: 35372784 PMCID: PMC8972324 DOI: 10.1093/abt/tbac005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 01/24/2022] [Accepted: 02/12/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
The target of an antibody plays a significant role in the success of antibody-based therapeutics and diagnostics, and vaccine development. This importance is focused on the target binding site—epitope, where epitope selection as a part of design thinking beyond traditional antigen selection using whole cell or whole protein immunization can positively impact success. With purified recombinant protein production and peptide synthesis to display limited/selected epitopes, intrinsic factors that can affect the functioning of resulting antibodies can be more easily selected for. Many of these factors stem from the location of the epitope that can impact accessibility of the antibody to the epitope at a cellular or molecular level, direct inhibition of target antigen activity, conservation of function despite escape mutations, and even non-competitive inhibition sites. By incorporating novel computational methods for predicting antigen changes to model-informed drug discovery and development, superior vaccines and antibody-based therapeutics or diagnostics can be easily designed to mitigate failures. With detailed examples, this review highlights the new opportunities, factors and methods of predicting antigenic changes for consideration in sagacious epitope selection.
Collapse
Affiliation(s)
- Samuel Ken-En Gan
- Antibody & Product Development Lab, EDDC-BII, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore
- APD SKEG Pte Ltd, Singapore 439444, Singapore
| | - Ser-Xian Phua
- Antibody & Product Development Lab, EDDC-BII, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore
| | - Joshua Yi Yeo
- Antibody & Product Development Lab, EDDC-BII, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore
| |
Collapse
|
19
|
Yeo JY, Gan SKE. Peering into Avian Influenza A(H5N8) for a Framework towards Pandemic Preparedness. Viruses 2021; 13:2276. [PMID: 34835082 PMCID: PMC8622263 DOI: 10.3390/v13112276] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/20/2021] [Accepted: 11/12/2021] [Indexed: 12/13/2022] Open
Abstract
2014 marked the first emergence of avian influenza A(H5N8) in Jeonbuk Province, South Korea, which then quickly spread worldwide. In the midst of the 2020-2021 H5N8 outbreak, it spread to domestic poultry and wild waterfowl shorebirds, leading to the first human infection in Astrakhan Oblast, Russia. Despite being clinically asymptomatic and without direct human-to-human transmission, the World Health Organization stressed the need for continued risk assessment given the nature of Influenza to reassort and generate novel strains. Given its promiscuity and easy cross to humans, the urgency to understand the mechanisms of possible species jumping to avert disastrous pandemics is increasing. Addressing the epidemiology of H5N8, its mechanisms of species jumping and its implications, mutational and reassortment libraries can potentially be built, allowing them to be tested on various models complemented with deep-sequencing and automation. With knowledge on mutational patterns, cellular pathways, drug resistance mechanisms and effects of host proteins, we can be better prepared against H5N8 and other influenza A viruses.
Collapse
Affiliation(s)
- Joshua Yi Yeo
- Antibody & Product Development Lab, EDDC-BII, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore;
| | - Samuel Ken-En Gan
- Antibody & Product Development Lab, EDDC-BII, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore;
- APD SKEG Pte Ltd., Singapore 439444, Singapore
| |
Collapse
|
20
|
Yin R, Luo Z, Zhuang P, Lin Z, Kwoh CK. VirPreNet: a weighted ensemble convolutional neural network for the virulence prediction of influenza A virus using all eight segments. Bioinformatics 2021; 37:737-743. [PMID: 33241321 DOI: 10.1093/bioinformatics/btaa901] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 09/29/2020] [Accepted: 10/06/2020] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Influenza viruses are persistently threatening public health, causing annual epidemics and sporadic pandemics. The evolution of influenza viruses remains to be the main obstacle in the effectiveness of antiviral treatments due to rapid mutations. Previous work has been investigated to reveal the determinants of virulence of the influenza A virus. To further facilitate flu surveillance, explicit detection of influenza virulence is crucial to protect public health from potential future pandemics. RESULTS In this article, we propose a weighted ensemble convolutional neural network (CNN) for the virulence prediction of influenza A viruses named VirPreNet that uses all eight segments. Firstly, mouse lethal dose 50 is exerted to label the virulence of infections into two classes, namely avirulent and virulent. A numerical representation of amino acids named ProtVec is applied to the eight-segments in a distributed manner to encode the biological sequences. After splittings and embeddings of influenza strains, the ensemble CNN is constructed as the base model on the influenza dataset of each segment, which serves as the VirPreNet's main part. Followed by a linear layer, the initial predictive outcomes are integrated and assigned with different weights for the final prediction. The experimental results on the collected influenza dataset indicate that VirPreNet achieves state-of-the-art performance combining ProtVec with our proposed architecture. It outperforms baseline methods on the independent testing data. Moreover, our proposed model reveals the importance of PB2 and HA segments on the virulence prediction. We believe that our model may provide new insights into the investigation of influenza virulence. AVAILABILITY AND IMPLEMENTATION Codes and data to generate the VirPreNet are publicly available at https://github.com/Rayin-saber/VirPreNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rui Yin
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Zihan Luo
- School of Electronic Information and Communication, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Pei Zhuang
- School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Zhuoyi Lin
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| |
Collapse
|
21
|
Dong Y, Yao YD. IoT Platform for COVID-19 Prevention and Control: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:49929-49941. [PMID: 34812390 PMCID: PMC8545211 DOI: 10.1109/access.2021.3068276] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 03/09/2021] [Indexed: 05/18/2023]
Abstract
As a result of the worldwide transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), coronavirus disease 2019 (COVID-19) has evolved into an unprecedented pandemic. Currently, with unavailable pharmaceutical treatments and low vaccination rates, this novel coronavirus results in a great impact on public health, human society, and global economy, which is likely to last for many years. One of the lessons learned from the COVID-19 pandemic is that a long-term system with non-pharmaceutical interventions for preventing and controlling new infectious diseases is desirable to be implemented. Internet of things (IoT) platform is preferred to be utilized to achieve this goal, due to its ubiquitous sensing ability and seamless connectivity. IoT technology is changing our lives through smart healthcare, smart home, and smart city, which aims to build a more convenient and intelligent community. This paper presents how the IoT could be incorporated into the epidemic prevention and control system. Specifically, we demonstrate a potential fog-cloud combined IoT platform that can be used in the systematic and intelligent COVID-19 prevention and control, which involves five interventions including COVID-19 Symptom Diagnosis, Quarantine Monitoring, Contact Tracing & Social Distancing, COVID-19 Outbreak Forecasting, and SARS-CoV-2 Mutation Tracking. We investigate and review the state-of-the-art literatures of these five interventions to present the capabilities of IoT in countering against the current COVID-19 pandemic or future infectious disease epidemics.
Collapse
Affiliation(s)
- Yudi Dong
- Department of Electrical and Computer EngineeringStevens Institute of TechnologyHobokenNJ07030USA
| | - Yu-Dong Yao
- Department of Electrical and Computer EngineeringStevens Institute of TechnologyHobokenNJ07030USA
| |
Collapse
|
22
|
Long Y, Luo J. Association Mining to Identify Microbe Drug Interactions Based on Heterogeneous Network Embedding Representation. IEEE J Biomed Health Inform 2021; 25:266-275. [PMID: 32750918 DOI: 10.1109/jbhi.2020.2998906] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Accurately identifying microbe-drug associations plays a critical role in drug development and precision medicine. Considering that the conventional wet-lab method is time-consuming, labor-intensive and expensive, computational approach is an alternative choice. The increasing availability of numerous biological data provides a great opportunity to systematically understand complex interaction mechanisms between microbes and drugs. However, few computational methods have been developed for microbe drug prediction. In this work, we leverage multiple sources of biomedical data to construct a heterogeneous network for microbes and drugs, including drug-drug interactions, microbe-microbe interactions and microbe-drug associations. And then we propose a novel Heterogeneous Network Embedding Representation framework for Microbe-Drug Association prediction, named (HNERMDA), by combining metapath2vec with bipartite network recommendation. In this framework, we introduce metapath2vec, a heterogeneous network representation learning method, to learn low-dimensional embedding representations for microbes and drugs. Following that, we further design a bias bipartite network projection recommendation algorithm to improve prediction accuracy. Comprehensive experiments on two datasets, named MDAD and aBiofilm, demonstrated that our model consistently outperformed five baseline methods in three types of cross-validations. Case study on two popular drugs (i.e., Ciprofloxacin and Pefloxacin) further validated the effectiveness of our HNERMDA model in inferring potential target microbes for drugs.
Collapse
|
23
|
Ali M, Khan DM, Aamir M, Khalil U, Khan Z. Forecasting COVID-19 in Pakistan. PLoS One 2020; 15:e0242762. [PMID: 33253248 PMCID: PMC7703963 DOI: 10.1371/journal.pone.0242762] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/10/2020] [Indexed: 11/18/2022] Open
Abstract
OBJECTIVES Forecasting epidemics like COVID-19 is of crucial importance, it will not only help the governments but also, the medical practitioners to know the future trajectory of the spread, which might help them with the best possible treatments, precautionary measures and protections. In this study, the popular autoregressive integrated moving average (ARIMA) will be used to forecast the cumulative number of confirmed, recovered cases, and the number of deaths in Pakistan from COVID-19 spanning June 25, 2020 to July 04, 2020 (10 days ahead forecast). METHODS To meet the desire objectives, data for this study have been taken from the Ministry of National Health Service of Pakistan's website from February 27, 2020 to June 24, 2020. Two different ARIMA models will be used to obtain the next 10 days ahead point and 95% interval forecast of the cumulative confirmed cases, recovered cases, and deaths. Statistical software, RStudio, with "forecast", "ggplot2", "tseries", and "seasonal" packages have been used for data analysis. RESULTS The forecasted cumulative confirmed cases, recovered, and the number of deaths up to July 04, 2020 are 231239 with a 95% prediction interval of (219648, 242832), 111616 with a prediction interval of (101063, 122168), and 5043 with a 95% prediction interval of (4791, 5295) respectively. Statistical measures i.e. root mean square error (RMSE) and mean absolute error (MAE) are used for model accuracy. It is evident from the analysis results that the ARIMA and seasonal ARIMA model is better than the other time series models in terms of forecasting accuracy and hence recommended to be used for forecasting epidemics like COVID-19. CONCLUSION It is concluded from this study that the forecasting accuracy of ARIMA models in terms of RMSE, and MAE are better than the other time series models, and therefore could be considered a good forecasting tool in forecasting the spread, recoveries, and deaths from the current outbreak of COVID-19. Besides, this study can also help the decision-makers in developing short-term strategies with regards to the current number of disease occurrences until an appropriate medication is developed.
Collapse
Affiliation(s)
- Muhammad Ali
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, KP, Pakistan
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, KP, Pakistan
| | - Muhammad Aamir
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, KP, Pakistan
| | - Umair Khalil
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, KP, Pakistan
| | - Zardad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, KP, Pakistan
| |
Collapse
|
24
|
Daniyal M, Ogundokun RO, Abid K, Khan MD, Ogundokun OE. Predictive modeling of COVID-19 death cases in Pakistan. Infect Dis Model 2020; 5:897-904. [PMID: 33195884 PMCID: PMC7647892 DOI: 10.1016/j.idm.2020.10.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 10/15/2020] [Accepted: 10/31/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The world is presently facing the challenges posed by COVID-19 (2019-nCoV), especially in the public health sector, and these challenges are dangerous to both health and life. The disease results in an acute respiratory infection that may result in pain and death. In Pakistan, the disease curve shows a vertical trend by almost 256K established cases of the diseases and 6035 documented death cases till August 5, 2020. OBJECTIVE The primary purpose of this study is to provide the statistical model to predict the trend of COVID-19 death cases in Pakistan. The age and gender of COVID-19 victims were represented using a descriptive study. METHOD ology: Three regression models, which include Linear, logarithmic, and quadratic, were employed in this study for the modelling of COVID-19 death cases in Pakistan. These three models were compared based on R2, Adjusted R2, AIC, and BIC criterions. The data utilized for the modelling was obtained from the National Institute of Health of Pakistan from February 26, 2020 to August 5, 2020. CONCLUSION The finding deduced after the prediction modelling is that the rate of mortality would decrease by the end of October. The total number of deaths will reach its maximum point; then, it will gradually decrease. This indicates that the curve of total deaths will continue to be flat, i.e., it will shift to be constant, which is also the upper bound of the underlying function of absolute death.
Collapse
Affiliation(s)
- Muhammad Daniyal
- Department of Statistics, Islamia University of Bahawalpur, Pakistan
| | | | - Khadijah Abid
- Research Evaluation Unit, College of Physicians & Surgeons, Pakistan
| | | | - Opeyemi Eyitayo Ogundokun
- Directoriate Department, Audit Section, Agricultural and Rural Management Training Institute, Ilorin, Nigeria
| |
Collapse
|
25
|
Forghani M, Khachay M. Convolutional Neural Network Based Approach to in Silico Non-Anticipating Prediction of Antigenic Distance for Influenza Virus. Viruses 2020; 12:E1019. [PMID: 32932748 PMCID: PMC7551508 DOI: 10.3390/v12091019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 09/06/2020] [Accepted: 09/08/2020] [Indexed: 12/18/2022] Open
Abstract
Evaluation of the antigenic similarity degree between the strains of the influenza virus is highly important for vaccine production. The conventional method used to measure such a degree is related to performing the immunological assays of hemagglutinin inhibition. Namely, the antigenic distance between two strains is calculated on the basis of HI assays. Usually, such distances are visualized by using some kind of antigenic cartography method. The known drawback of the HI assay is that it is rather time-consuming and expensive. In this paper, we propose a novel approach for antigenic distance approximation based on deep learning in the feature spaces induced by hemagglutinin protein sequences and Convolutional Neural Networks (CNNs). To apply a CNN to compare the protein sequences, we utilize the encoding based on the physical and chemical characteristics of amino acids. By varying (hyper)parameters of the CNN architecture design, we find the most robust network. Further, we provide insight into the relationship between approximated antigenic distance and antigenicity by evaluating the network on the HI assay database for the H1N1 subtype. The results indicate that the best-trained network gives a high-precision approximation for the ground-truth antigenic distances, and can be used as a good exploratory tool in practical tasks.
Collapse
|
26
|
Adly AS, Adly AS, Adly MS. Approaches Based on Artificial Intelligence and the Internet of Intelligent Things to Prevent the Spread of COVID-19: Scoping Review. J Med Internet Res 2020; 22:e19104. [PMID: 32584780 PMCID: PMC7423390 DOI: 10.2196/19104] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 06/24/2020] [Accepted: 06/25/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Artificial intelligence (AI) and the Internet of Intelligent Things (IIoT) are promising technologies to prevent the concerningly rapid spread of coronavirus disease (COVID-19) and to maximize safety during the pandemic. With the exponential increase in the number of COVID-19 patients, it is highly possible that physicians and health care workers will not be able to treat all cases. Thus, computer scientists can contribute to the fight against COVID-19 by introducing more intelligent solutions to achieve rapid control of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes the disease. OBJECTIVE The objectives of this review were to analyze the current literature, discuss the applicability of reported ideas for using AI to prevent and control COVID-19, and build a comprehensive view of how current systems may be useful in particular areas. This may be of great help to many health care administrators, computer scientists, and policy makers worldwide. METHODS We conducted an electronic search of articles in the MEDLINE, Google Scholar, Embase, and Web of Knowledge databases to formulate a comprehensive review that summarizes different categories of the most recently reported AI-based approaches to prevent and control the spread of COVID-19. RESULTS Our search identified the 10 most recent AI approaches that were suggested to provide the best solutions for maximizing safety and preventing the spread of COVID-19. These approaches included detection of suspected cases, large-scale screening, monitoring, interactions with experimental therapies, pneumonia screening, use of the IIoT for data and information gathering and integration, resource allocation, predictions, modeling and simulation, and robotics for medical quarantine. CONCLUSIONS We found few or almost no studies regarding the use of AI to examine COVID-19 interactions with experimental therapies, the use of AI for resource allocation to COVID-19 patients, or the use of AI and the IIoT for COVID-19 data and information gathering/integration. Moreover, the adoption of other approaches, including use of AI for COVID-19 prediction, use of AI for COVID-19 modeling and simulation, and use of AI robotics for medical quarantine, should be further emphasized by researchers because these important approaches lack sufficient numbers of studies. Therefore, we recommend that computer scientists focus on these approaches, which are still not being adequately addressed.
Collapse
Affiliation(s)
- Aya Sedky Adly
- Faculty of Computers and Artificial Intelligence, Helwan University, Cairo, Egypt
| | - Afnan Sedky Adly
- Faculty of Physical Therapy, Cardiovascular-Respiratory Disorders and Geriatrics, Laser Applications in Physical Medicine, Cairo University, Cairo, Egypt
- Faculty of Physical Therapy, Internal Medicine, Beni-Suef University, Beni-Suef, Egypt
| | - Mahmoud Sedky Adly
- Faculty of Oral and Dental Medicine, Cairo University, Cairo, Egypt
- Royal College of Surgeons of Edinburgh, Scotland, United Kingdom
| |
Collapse
|
27
|
COVID-19: A Comparison of Time Series Methods to Forecast Percentage of Active Cases per Population. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10113880] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The ongoing COVID-19 pandemic has caused worldwide socioeconomic unrest, forcing governments to introduce extreme measures to reduce its spread. Being able to accurately forecast when the outbreak will hit its peak would significantly diminish the impact of the disease, as it would allow governments to alter their policy accordingly and plan ahead for the preventive steps needed such as public health messaging, raising awareness of citizens and increasing the capacity of the health system. This study investigated the accuracy of a variety of time series modeling approaches for coronavirus outbreak detection in ten different countries with the highest number of confirmed cases as of 4 May 2020. For each of these countries, six different time series approaches were developed and compared using two publicly available datasets regarding the progression of the virus in each country and the population of each country, respectively. The results demonstrate that, given data produced using actual testing for a small portion of the population, machine learning time series methods can learn and scale to accurately estimate the percentage of the total population that will become affected in the future.
Collapse
|