Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Qiang X, Kou Z, Fang G, Wang Y. Scoring Amino Acid Mutations to Predict Avian-to-Human Transmission of Avian Influenza Viruses. Molecules 2018;23:E1584. [PMID: 29966263 DOI: 10.3390/molecules23071584] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 06/13/2018] [Accepted: 06/19/2018] [Indexed: 11/17/2022] Open

For:	Qiang X, Kou Z, Fang G, Wang Y. Scoring Amino Acid Mutations to Predict Avian-to-Human Transmission of Avian Influenza Viruses. Molecules 2018;23:E1584. [PMID: 29966263 DOI: 10.3390/molecules23071584] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 06/13/2018] [Accepted: 06/19/2018] [Indexed: 11/17/2022] Open

Number

Cited by Other Article(s)

Chokkakula S, Oh S, Choi WS, Kim CI, Jeong JH, Kim BK, Park JH, Min SC, Kim EG, Baek YH, Choi YK, Song MS. Mammalian adaptation risk in HPAI H5N8: a comprehensive model bridging experimental data with mathematical insights. Emerg Microbes Infect 2024;13:2339949. [PMID: 38572657 PMCID: PMC11022924 DOI: 10.1080/22221751.2024.2339949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 04/03/2024] [Indexed: 04/05/2024]

Alberts F, Berke O, Maboni G, Petukhova T, Poljak Z. Utilizing machine learning and hemagglutinin sequences to identify likely hosts of influenza H3Nx viruses. Prev Vet Med 2024;233:106351. [PMID: 39353303 DOI: 10.1016/j.prevetmed.2024.106351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 08/16/2024] [Accepted: 09/25/2024] [Indexed: 10/04/2024]

Abstract

Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.

Collapse

Zhang Y, Eskridge KM, Zhang S, Lu G. Identifying host-specific amino acid signatures for influenza A viruses using an adjusted entropy measure. BMC Bioinformatics 2022;23:333. [PMID: 35962315 PMCID: PMC9372975 DOI: 10.1186/s12859-022-04885-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 08/02/2022] [Indexed: 11/29/2022] Open

Evolution of the North American Lineage H7 Avian Influenza Viruses in Association with H7 Virus's Introduction to Poultry. J Virol 2022;96:e0027822. [PMID: 35862690 PMCID: PMC9327676 DOI: 10.1128/jvi.00278-22] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Abstract

The incursions of H7 subtype low-pathogenicity avian influenza virus (LPAIV) from wild birds into poultry and its mutations to highly pathogenic avian influenza virus (HPAIV) have been an ongoing concern in North America. Since 2000, 10 phylogenetically distinct H7 virus outbreaks from wild birds have been detected in poultry, six of which mutated to HPAIV. To study the molecular evolution of the H7 viruses that occurs when changing hosts from wild birds to poultry, we performed analyses of the North American H7 hemagglutinin (HA) genes to identify amino acid changes as the virus circulated in wild birds from 2000 to 2019. Then, we analyzed recurring HA amino acid changes and gene constellations of the viruses that spread from wild birds to poultry. We found six HA amino acid changes occurring during wild bird circulation and 10 recurring changes after the spread to poultry. Eight of the changes were in and around the HA antigenic sites, three of which were supported by positive selection. Viruses from each H7 outbreak had a unique genotype, with no specific genetic group associated with poultry outbreaks or mutation to HPAIV. However, the genotypes of the H7 viruses in poultry outbreaks tended to contain minor genetic groups less observed in wild bird H7 viruses, suggesting either a biased sampling of wild bird AIVs or a tendency of having reassortment with minor genetic groups prior to the virus's introduction to poultry. IMPORTANCE Wild bird-origin H7 subtype avian influenza viruses are a constant threat to commercial poultry, both directly by the disease they cause and indirectly through trade restrictions that can be imposed when the virus is detected in poultry. It is important to understand the genetic basis of why the North American lineage H7 viruses have repeatedly crossed the species barrier from wild birds to poultry. We examined the amino acid changes in the H7 viruses associated with poultry outbreaks and tried to determine gene reassortment related to poultry adaptation and mutations to HPAIV. The findings in this study increase the understanding of the evolutionary pathways of wild bird AIV before infecting poultry and the HA changes associated with adaptation of the virus in poultry.

Collapse

Kou Z, Fan X, Li J, Shao Z, Qiang X. Using amino acid features to identify the pathogenicity of influenza B virus. Infect Dis Poverty 2022;11:50. [PMID: 35509019 PMCID: PMC9066401 DOI: 10.1186/s40249-022-00974-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 04/16/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus.

METHODS

The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification.

RESULTS

The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method.

CONCLUSIONS

The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.

Collapse

Predicting Cross-Species Infection of Swine Influenza Virus with Representation Learning of Amino Acid Features. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021;2021:6985008. [PMID: 34671417 PMCID: PMC8523279 DOI: 10.1155/2021/6985008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/27/2021] [Accepted: 09/28/2021] [Indexed: 11/17/2022]

Borkenhagen LK, Allen MW, Runstadler JA. Influenza virus genotype to phenotype predictions through machine learning: a systematic review. Emerg Microbes Infect 2021;10:1896-1907. [PMID: 34498543 PMCID: PMC8462836 DOI: 10.1080/22221751.2021.1978824] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Kim J, Lee K, Rupasinghe R, Rezaei S, Martínez-López B, Liu X. Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene. Front Vet Sci 2021;8:683134. [PMID: 34368274 PMCID: PMC8345883 DOI: 10.3389/fvets.2021.683134] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 06/23/2021] [Indexed: 11/13/2022] Open

Abstract

Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance.

Collapse

Li J, Zhang S, Li B, Hu Y, Kang XP, Wu XY, Huang MT, Li YC, Zhao ZP, Qin CF, Jiang T. Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions. Mol Biol Evol 2021;37:1224-1236. [PMID: 31750915 PMCID: PMC7086167 DOI: 10.1093/molbev/msz276] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Host-Virus Interaction: How Host Cells Defend against Influenza A Virus Infection. Viruses 2020;12:v12040376. [PMID: 32235330 PMCID: PMC7232439 DOI: 10.3390/v12040376] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 03/19/2020] [Accepted: 03/25/2020] [Indexed: 02/07/2023] Open

Qiang XL, Xu P, Fang G, Liu WB, Kou Z. Using the spike protein feature to predict infection risk and monitor the evolutionary dynamic of coronavirus. Infect Dis Poverty 2020;9:33. [PMID: 32209118 PMCID: PMC7093988 DOI: 10.1186/s40249-020-00649-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 03/16/2020] [Indexed: 01/08/2023] Open

Abstract

BACKGROUND

Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome. SARS-CoV-2 with potential origin of bat is still circulating in China. In this study, a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning.

METHODS

The spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource (2019nCoVR) Database of China National Genomics Data Center on Jan 29, 2020. A total of 507 human-origin viruses were regarded as positive samples, whereas 2159 non-human-origin viruses were regarded as negative. To capture the key information of the spike protein, three feature encoding algorithms (amino acid composition, AAC; parallel correlation-based pseudo-amino-acid composition, PC-PseAAC and G-gap dipeptide composition, GGAP) were used to train 41 random forest models. The optimal feature with the best performance was identified by the multidimensional scaling method, which was used to explore the pattern of human coronavirus.

RESULTS

The 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP (g = 3) feature. The predictive model achieved the maximum ACC of 98.18% coupled with the Matthews correlation coefficient (MCC) of 0.9638. Seven clusters for human coronaviruses (229E, NL63, OC43, HKU1, MERS-CoV, SARS-CoV, and SARS-CoV-2) were found. The cluster for SARS-CoV-2 was very close to that for SARS-CoV, which suggests that both of viruses have the same human receptor (angiotensin converting enzyme II). The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously. The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual.

CONCLUSIONS

The optimal feature (GGAP, g = 3) performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple, fast and large-scale manner. The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.

Collapse

Liang X, Zhu W, Lv Z, Zou Q. Molecular Computing and Bioinformatics. Molecules 2019;24:E2358. [PMID: 31247973 PMCID: PMC6651761 DOI: 10.3390/molecules24132358] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 06/25/2019] [Indexed: 02/06/2023] Open

Qiang X, Kou Z. Scoring amino acid mutation to predict pandemic risk of avian influenza virus. BMC Bioinformatics 2019;20:288. [PMID: 31182019 PMCID: PMC6557742 DOI: 10.1186/s12859-019-2770-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Mishra B, Kumar N, Mukhtar MS. Systems Biology and Machine Learning in Plant-Pathogen Interactions. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2019;32:45-55. [PMID: 30418085 DOI: 10.1094/mpmi-08-18-0221-fi] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]