1
|
Chokkakula S, Oh S, Choi WS, Kim CI, Jeong JH, Kim BK, Park JH, Min SC, Kim EG, Baek YH, Choi YK, Song MS. Mammalian adaptation risk in HPAI H5N8: a comprehensive model bridging experimental data with mathematical insights. Emerg Microbes Infect 2024; 13:2339949. [PMID: 38572657 PMCID: PMC11022924 DOI: 10.1080/22221751.2024.2339949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 04/03/2024] [Indexed: 04/05/2024]
Abstract
Understanding the mammalian pathogenesis and interspecies transmission of HPAI H5N8 virus hinges on mapping its adaptive markers. We used deep sequencing to track these markers over five passages in murine lung tissue. Subsequently, we evaluated the growth, selection, and RNA load of eight recombinant viruses with mammalian adaptive markers. By leveraging an integrated non-linear regression model, we quantitatively determined the influence of these markers on growth, adaptation, and RNA expression in mammalian hosts. Furthermore, our findings revealed that the interplay of these markers can lead to synergistic, additive, or antagonistic effects when combined. The elucidation distance method then transformed these results into distinct values, facilitating the derivation of a risk score for each marker. In vivo tests affirmed the accuracy of scores. As more mutations were incorporated, the overall risk score of virus heightened, and the optimal interplay between markers became essential for risk augmentation. Our study provides a robust model to assess risk from adaptive markers of HPAI H5N8, guiding strategies against future influenza threats.
Collapse
Affiliation(s)
- Santosh Chokkakula
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Sol Oh
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Won-Suk Choi
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Chang Il Kim
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Ju Hwan Jeong
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Beom Kyu Kim
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Ji-Hyun Park
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Seong Cheol Min
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Eung-Gook Kim
- Department of Biochemistry, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Yun Hee Baek
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| | - Young Ki Choi
- Center for Study of Emerging and Re-emerging Viruses, Korea Virus Research Institute, Institute for Basic Science (IBS), Daejeon, Republic of Korea
| | - Min-Suk Song
- Department of Microbiology, College of Medicine and Medical Research Institute, Chungbuk National University, Cheongju, Republic of Korea
| |
Collapse
|
2
|
Alberts F, Berke O, Maboni G, Petukhova T, Poljak Z. Utilizing machine learning and hemagglutinin sequences to identify likely hosts of influenza H3Nx viruses. Prev Vet Med 2024; 233:106351. [PMID: 39353303 DOI: 10.1016/j.prevetmed.2024.106351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 08/16/2024] [Accepted: 09/25/2024] [Indexed: 10/04/2024]
Abstract
Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.
Collapse
Affiliation(s)
- Famke Alberts
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada.
| | - Olaf Berke
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada; Centre for Public Health and Zoonoses, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada; Centre for Advancing Responsible and Ethical Artificial Intelligence, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada.
| | - Grazieli Maboni
- Athens Veterinary Diagnostic Laboratory, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, 501 D.W.Brooks Drive Athens, GA, USA.
| | - Tatiana Petukhova
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada.
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada; Centre for Public Health and Zoonoses, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada.
| |
Collapse
|
3
|
Zhang Y, Eskridge KM, Zhang S, Lu G. Identifying host-specific amino acid signatures for influenza A viruses using an adjusted entropy measure. BMC Bioinformatics 2022; 23:333. [PMID: 35962315 PMCID: PMC9372975 DOI: 10.1186/s12859-022-04885-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 08/02/2022] [Indexed: 11/29/2022] Open
Abstract
Background Influenza A viruses (IAV) exhibit vast genetic mutability and have great zoonotic potential to infect avian and mammalian hosts and are known to be responsible for a number of pandemics. A key computational issue in influenza prevention and control is the identification of molecular signatures with cross-species transmission potential. We propose an adjusted entropy-based host-specific signature identification method that uses a similarity coefficient to incorporate the amino acid substitution information and improve the identification performance. Mutations in the polymerase genes (e.g., PB2) are known to play a major role in avian influenza virus adaptation to mammalian hosts. We thus focus on the analysis of PB2 protein sequences and identify host specific PB2 amino acid signatures. Results Validation with a set of H5N1 PB2 sequences from 1996 to 2006 results in adjusted entropy having a 40% false negative discovery rate compared to a 60% false negative rate using unadjusted entropy. Simulations across different levels of sequence divergence show a false negative rate of no higher than 10% while unadjusted entropy ranged from 9 to 100%. In addition, under all levels of divergence adjusted entropy never had a false positive rate higher than 9%. Adjusted entropy also identifies important mutations in H1N1pdm PB2 previously identified in the literature that explain changes in divergence between 2008 and 2009 which unadjusted entropy could not identify. Conclusions Based on these results, adjusted entropy provides a reliable and widely applicable host signature identification approach useful for IAV monitoring and vaccine development.
Collapse
Affiliation(s)
- Yixiang Zhang
- Department of Statistics, University of Nebraska - Lincoln, Lincoln, NE, USA
| | - Kent M Eskridge
- Department of Statistics, University of Nebraska - Lincoln, Lincoln, NE, USA.
| | - Shunpu Zhang
- Department of Statistics, University of Central Florida, Orlando, USA
| | - Guoqing Lu
- Department of Biology, University of Nebraska - Omaha, Omaha, NE, USA
| |
Collapse
|
4
|
Evolution of the North American Lineage H7 Avian Influenza Viruses in Association with H7 Virus's Introduction to Poultry. J Virol 2022; 96:e0027822. [PMID: 35862690 PMCID: PMC9327676 DOI: 10.1128/jvi.00278-22] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The incursions of H7 subtype low-pathogenicity avian influenza virus (LPAIV) from wild birds into poultry and its mutations to highly pathogenic avian influenza virus (HPAIV) have been an ongoing concern in North America. Since 2000, 10 phylogenetically distinct H7 virus outbreaks from wild birds have been detected in poultry, six of which mutated to HPAIV. To study the molecular evolution of the H7 viruses that occurs when changing hosts from wild birds to poultry, we performed analyses of the North American H7 hemagglutinin (HA) genes to identify amino acid changes as the virus circulated in wild birds from 2000 to 2019. Then, we analyzed recurring HA amino acid changes and gene constellations of the viruses that spread from wild birds to poultry. We found six HA amino acid changes occurring during wild bird circulation and 10 recurring changes after the spread to poultry. Eight of the changes were in and around the HA antigenic sites, three of which were supported by positive selection. Viruses from each H7 outbreak had a unique genotype, with no specific genetic group associated with poultry outbreaks or mutation to HPAIV. However, the genotypes of the H7 viruses in poultry outbreaks tended to contain minor genetic groups less observed in wild bird H7 viruses, suggesting either a biased sampling of wild bird AIVs or a tendency of having reassortment with minor genetic groups prior to the virus's introduction to poultry. IMPORTANCE Wild bird-origin H7 subtype avian influenza viruses are a constant threat to commercial poultry, both directly by the disease they cause and indirectly through trade restrictions that can be imposed when the virus is detected in poultry. It is important to understand the genetic basis of why the North American lineage H7 viruses have repeatedly crossed the species barrier from wild birds to poultry. We examined the amino acid changes in the H7 viruses associated with poultry outbreaks and tried to determine gene reassortment related to poultry adaptation and mutations to HPAIV. The findings in this study increase the understanding of the evolutionary pathways of wild bird AIV before infecting poultry and the HA changes associated with adaptation of the virus in poultry.
Collapse
|
5
|
Kou Z, Fan X, Li J, Shao Z, Qiang X. Using amino acid features to identify the pathogenicity of influenza B virus. Infect Dis Poverty 2022; 11:50. [PMID: 35509019 PMCID: PMC9066401 DOI: 10.1186/s40249-022-00974-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 04/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus. METHODS The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification. RESULTS The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method. CONCLUSIONS The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.
Collapse
Affiliation(s)
- Zheng Kou
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Xinyue Fan
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Junjie Li
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Zehui Shao
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Xiaoli Qiang
- School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
6
|
Predicting Cross-Species Infection of Swine Influenza Virus with Representation Learning of Amino Acid Features. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6985008. [PMID: 34671417 PMCID: PMC8523279 DOI: 10.1155/2021/6985008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/27/2021] [Accepted: 09/28/2021] [Indexed: 11/17/2022]
Abstract
Swine influenza viruses (SIVs) can unforeseeably cross the species barriers and directly infect humans, which pose huge challenges for public health and trigger pandemic risk at irregular intervals. Computational tools are needed to predict infection phenotype and early pandemic risk of SIVs. For this purpose, we propose a feature representation algorithm to predict cross-species infection of SIVs. We built a high-quality dataset of 1902 viruses. A feature representation learning scheme was applied to learn feature representations from 64 well-trained random forest models with multiple feature descriptors of mutant amino acid in the viral proteins, including compositional information, position-specific information, and physicochemical properties. Class and probabilistic information were integrated into the feature representations, and redundant features were removed by feature space optimization. High performance was achieved using 20 informative features and 22 probabilistic information. The proposed method will facilitate SIV characterization of transmission phenotype.
Collapse
|
7
|
Borkenhagen LK, Allen MW, Runstadler JA. Influenza virus genotype to phenotype predictions through machine learning: a systematic review. Emerg Microbes Infect 2021; 10:1896-1907. [PMID: 34498543 PMCID: PMC8462836 DOI: 10.1080/22221751.2021.1978824] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background: There is great interest in understanding the viral genomic predictors of phenotypic traits that allow influenza A viruses to adapt to or become more virulent in different hosts. Machine learning techniques have demonstrated promise in addressing this critical need for other pathogens because the underlying algorithms are especially well equipped to uncover complex patterns in large datasets and produce generalizable predictions for new data. As the body of research where these techniques are applied for influenza A virus phenotype prediction continues to grow, it is useful to consider the strengths and weaknesses of these approaches to understand what has prevented these models from seeing widespread use by surveillance laboratories and to identify gaps that are underexplored with this technology. Methods and Results: We present a systematic review of English literature published through 15 April 2021 of studies employing machine learning methods to generate predictions of influenza A virus phenotypes from genomic or proteomic input. Forty-nine studies were included in this review, spanning the topics of host discrimination, human adaptability, subtype and clade assignment, pandemic lineage assignment, characteristics of infection, and antiviral drug resistance. Conclusions: Our findings suggest that biases in model design and a dearth of wet laboratory follow-up may explain why these models often go underused. We, therefore, offer guidance to overcome these limitations, aid in improving predictive models of previously studied influenza A virus phenotypes, and extend those models to unexplored phenotypes in the ultimate pursuit of tools to enable the characterization of virus isolates across surveillance laboratories.
Collapse
Affiliation(s)
- Laura K Borkenhagen
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| | - Martin W Allen
- Department of Computer Science, School of Engineering, Tufts University, Medford, MA, USA
| | - Jonathan A Runstadler
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| |
Collapse
|
8
|
Kim J, Lee K, Rupasinghe R, Rezaei S, Martínez-López B, Liu X. Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene. Front Vet Sci 2021; 8:683134. [PMID: 34368274 PMCID: PMC8345883 DOI: 10.3389/fvets.2021.683134] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 06/23/2021] [Indexed: 11/13/2022] Open
Abstract
Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance.
Collapse
Affiliation(s)
- Jeonghoon Kim
- Department of Mathematics, University of California, Davis, Davis, CA, United States
| | - Kyuyoung Lee
- Department of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United States
| | - Ruwini Rupasinghe
- Department of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United States
| | - Shahbaz Rezaei
- Department of Computer Science, University of California, Davis, Davis, CA, United States
| | - Beatriz Martínez-López
- Department of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United States
| | - Xin Liu
- Department of Computer Science, University of California, Davis, Davis, CA, United States
| |
Collapse
|
9
|
Li J, Zhang S, Li B, Hu Y, Kang XP, Wu XY, Huang MT, Li YC, Zhao ZP, Qin CF, Jiang T. Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions. Mol Biol Evol 2021; 37:1224-1236. [PMID: 31750915 PMCID: PMC7086167 DOI: 10.1093/molbev/msz276] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide composition. A total of 217,549 IAV full-length coding sequences of the PB2 (polymerase basic protein-2), PB1, PA (polymerase acidic protein), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). A total of 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13, 10 and 9 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic curve indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic data sets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.
Collapse
Affiliation(s)
- Jing Li
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Sen Zhang
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Bo Li
- Department of Clinical Laboratory, the Fifth Medical Centre of Chinese PLA General Hospital, Beijing, China
| | - Yi Hu
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Xiao-Ping Kang
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Xiao-Yan Wu
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Meng-Ting Huang
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.,Graduate School, Anhui Medical University, Hefei, China
| | - Yu-Chang Li
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Zhong-Peng Zhao
- Department of Infection and Immunology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Cheng-Feng Qin
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Tao Jiang
- Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.,Graduate School, Anhui Medical University, Hefei, China
| |
Collapse
|
10
|
Host-Virus Interaction: How Host Cells Defend against Influenza A Virus Infection. Viruses 2020; 12:v12040376. [PMID: 32235330 PMCID: PMC7232439 DOI: 10.3390/v12040376] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 03/19/2020] [Accepted: 03/25/2020] [Indexed: 02/07/2023] Open
Abstract
Influenza A viruses (IAVs) are highly contagious pathogens infecting human and numerous animals. The viruses cause millions of infection cases and thousands of deaths every year, thus making IAVs a continual threat to global health. Upon IAV infection, host innate immune system is triggered and activated to restrict virus replication and clear pathogens. Subsequently, host adaptive immunity is involved in specific virus clearance. On the other hand, to achieve a successful infection, IAVs also apply multiple strategies to avoid be detected and eliminated by the host immunity. In the current review, we present a general description on recent work regarding different host cells and molecules facilitating antiviral defenses against IAV infection and how IAVs antagonize host immune responses.
Collapse
|
11
|
Qiang XL, Xu P, Fang G, Liu WB, Kou Z. Using the spike protein feature to predict infection risk and monitor the evolutionary dynamic of coronavirus. Infect Dis Poverty 2020; 9:33. [PMID: 32209118 PMCID: PMC7093988 DOI: 10.1186/s40249-020-00649-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 03/16/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome. SARS-CoV-2 with potential origin of bat is still circulating in China. In this study, a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning. METHODS The spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource (2019nCoVR) Database of China National Genomics Data Center on Jan 29, 2020. A total of 507 human-origin viruses were regarded as positive samples, whereas 2159 non-human-origin viruses were regarded as negative. To capture the key information of the spike protein, three feature encoding algorithms (amino acid composition, AAC; parallel correlation-based pseudo-amino-acid composition, PC-PseAAC and G-gap dipeptide composition, GGAP) were used to train 41 random forest models. The optimal feature with the best performance was identified by the multidimensional scaling method, which was used to explore the pattern of human coronavirus. RESULTS The 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP (g = 3) feature. The predictive model achieved the maximum ACC of 98.18% coupled with the Matthews correlation coefficient (MCC) of 0.9638. Seven clusters for human coronaviruses (229E, NL63, OC43, HKU1, MERS-CoV, SARS-CoV, and SARS-CoV-2) were found. The cluster for SARS-CoV-2 was very close to that for SARS-CoV, which suggests that both of viruses have the same human receptor (angiotensin converting enzyme II). The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously. The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual. CONCLUSIONS The optimal feature (GGAP, g = 3) performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple, fast and large-scale manner. The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.
Collapse
Affiliation(s)
- Xiao-Li Qiang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Peng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Gang Fang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Wen-Bin Liu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Zheng Kou
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
12
|
Liang X, Zhu W, Lv Z, Zou Q. Molecular Computing and Bioinformatics. Molecules 2019; 24:E2358. [PMID: 31247973 PMCID: PMC6651761 DOI: 10.3390/molecules24132358] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 06/25/2019] [Indexed: 02/06/2023] Open
Abstract
Molecular computing and bioinformatics are two important interdisciplinary sciences that study molecules and computers. Molecular computing is a branch of computing that uses DNA, biochemistry, and molecular biology hardware, instead of traditional silicon-based computer technologies. Research and development in this area concerns theory, experiments, and applications of molecular computing. The core advantage of molecular computing is its potential to pack vastly more circuitry onto a microchip than silicon will ever be capable of-and to do it cheaply. Molecules are only a few nanometers in size, making it possible to manufacture chips that contain billions-even trillions-of switches and components. To develop molecular computers, computer scientists must draw on expertise in subjects not usually associated with their field, including organic chemistry, molecular biology, bioengineering, and smart materials. Bioinformatics works on the contrary; bioinformatics researchers develop novel algorithms or software tools for computing or predicting the molecular structure or function. Molecular computing and bioinformatics pay attention to the same object, and have close relationships, but work toward different orientations.
Collapse
Affiliation(s)
- Xin Liang
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China
| | - Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
13
|
Abstract
BACKGROUND Avian influenza virus can directly cross species barriers and infect humans with high fatality. As antigen novelty for human host, the public health is being challenged seriously. The pandemic risk of avian influenza viruses should be analyzed and a prediction model should be constructed for virology applications. RESULTS The 178 signature positions in 11 viral proteins were firstly screened as features by the scores of five amino acid factors and their random forest rankings. The Supporting Vector Machine algorithm achieved well performance. The most important amino acid factor (Factor 5) and the minimal range of signature positions (63 amino acid residues) were also explored. Moreover, human-origin avian influenza viruses with three or four genome segments from human virus had pandemic risk with high probability. CONCLUSION Using machine learning methods, the present paper scores the amino acid mutations and predicts pandemic risk with well performance. Although long evolution distances between avian and human viruses suggest that avian influenza virus in nature still need time to fix among human host, it should be notable that there are high pandemic risks for H7N9 and H9N2 avian viruses.
Collapse
Affiliation(s)
- Xiaoli Qiang
- Institute of Computing Science and Technology, Guangzhou University, 230 Wai Huan Xi Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006 People’s Republic of China
| | - Zheng Kou
- Institute of Computing Science and Technology, Guangzhou University, 230 Wai Huan Xi Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006 People’s Republic of China
| |
Collapse
|
14
|
Mishra B, Kumar N, Mukhtar MS. Systems Biology and Machine Learning in Plant-Pathogen Interactions. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2019; 32:45-55. [PMID: 30418085 DOI: 10.1094/mpmi-08-18-0221-fi] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Systems biology is an inclusive approach to study the static and dynamic emergent properties on a global scale by integrating multiomics datasets to establish qualitative and quantitative associations among multiple biological components. With an abundance of improved high throughput -omics datasets, network-based analyses and machine learning technologies are playing a pivotal role in comprehensive understanding of biological systems. Network topological features reveal most important nodes within a network as well as prioritize significant molecular components for diverse biological networks, including coexpression, protein-protein interaction, and gene regulatory networks. Machine learning techniques provide enormous predictive power through specific feature extraction from biological data. Deep learning, a subtype of machine learning, has plausible future applications because a domain expert for feature extraction is not needed in this algorithm. Inspired by diverse domains of biology, we here review classic systems biology techniques applied in plant immunity thus far. We also discuss additional advanced approaches in both graph theory and machine learning, which may provide new insights for understanding plant-microbe interactions. Finally, we propose a hybrid approach in plant immune systems that harnesses the power of both network biology and machine learning, with a potential to be applicable to both model systems and agronomically important crop plants.
Collapse
Affiliation(s)
| | | | - M Shahid Mukhtar
- 1 Department of Biology, and
- 2 Nutrition Obesity Research Center, University of Alabama at Birmingham, 1300 University Blvd., Birmingham 35294, U.S.A
| |
Collapse
|