1
|
Zhang S, Li YD, Cai YR, Kang XP, Feng Y, Li YC, Chen YH, Li J, Bao LL, Jiang T. Compositional features analysis by machine learning in genome represents linear adaptation of monkeypox virus. Front Genet 2024; 15:1361952. [PMID: 38495668 PMCID: PMC10940399 DOI: 10.3389/fgene.2024.1361952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/21/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction: The global headlines have been dominated by the sudden and widespread outbreak of monkeypox, a rare and endemic zoonotic disease caused by the monkeypox virus (MPXV). Genomic composition based machine learning (ML) methods have recently shown promise in identifying host adaptability and evolutionary patterns of virus. Our study aimed to analyze the genomic characteristics and evolutionary patterns of MPXV using ML methods. Methods: The open reading frame (ORF) regions of full-length MPXV genomes were filtered and 165 ORFs were selected as clusters with the highest homology. Unsupervised machine learning methods of t-distributed stochastic neighbor embedding (t-SNE), Principal Component Analysis (PCA), and hierarchical clustering were performed to observe the DCR characteristics of the selected ORF clusters. Results: The results showed that MPXV sequences post-2022 showed an obvious linear adaptive evolution, indicating that it has become more adapted to the human host after accumulating mutations. For further accurate analysis, the ORF regions with larger variations were filtered out based on the ranking of homology difference to narrow down the key ORF clusters, which drew the same conclusion of linear adaptability. Then key differential protein structures were predicted by AlphaFold 2, which meant that difference in main domains might be one of the internal reasons for linear adaptive evolution. Discussion: Understanding the process of linear adaptation is critical in the constant evolutionary struggle between viruses and their hosts, playing a significant role in crafting effective measures to tackle viral diseases. Therefore, the present study provides valuable insights into the evolutionary patterns of the MPXV in 2022 from the perspective of genomic composition characteristics analysis through ML methods.
Collapse
Affiliation(s)
- Sen Zhang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Ya-Dan Li
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| | - Yu-Rong Cai
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
- College of the First Clinical Medical, Inner Mongolia Medical University, Hohhot, China
| | - Xiao-Ping Kang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Ye Feng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Yu-Chang Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Yue-Hong Chen
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Jing Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| | - Li-Li Bao
- College of Basic Medical Sciences, Inner Mongolia Medical University, Hohhot, China
| | - Tao Jiang
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| |
Collapse
|
2
|
Petrone ME, Parry R, Mifsud JCO, Van Brussel K, Vorhees I, Richards ZT, Holmes EC. Evidence for an ancient aquatic origin of the RNA viral order Articulavirales. Proc Natl Acad Sci U S A 2023; 120:e2310529120. [PMID: 37906647 PMCID: PMC10636315 DOI: 10.1073/pnas.2310529120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/03/2023] [Indexed: 11/02/2023] Open
Abstract
The emergence of previously unknown disease-causing viruses in mammals is in part the result of a long-term evolutionary process. Reconstructing the deep phylogenetic histories of viruses helps identify major evolutionary transitions and contextualizes the emergence of viruses in new hosts. We used a combination of total RNA sequencing and transcriptome data mining to extend the diversity and evolutionary history of the RNA virus order Articulavirales, which includes the influenza viruses. We identified instances of Articulavirales in the invertebrate phylum Cnidaria (including corals), constituting a novel and divergent family that we provisionally named the "Cnidenomoviridae." We further extended the evolutionary history of the influenza virus lineage by identifying four divergent, fish-associated influenza-like viruses, thereby supporting the hypothesis that fish were among the first hosts of influenza viruses. In addition, we substantially expanded the phylogenetic diversity of quaranjaviruses and proposed that this genus be reclassified as a family-the "Quaranjaviridae." Within this putative family, we identified a novel arachnid-infecting genus, provisionally named "Cheliceravirus." Notably, we observed a close phylogenetic relationship between the Crustacea- and Chelicerata-infecting "Quaranjaviridae" that is inconsistent with virus-host codivergence. Together, these data suggest that the Articulavirales has evolved over at least 600 million years, first emerging in aquatic animals. Importantly, the evolution of the Articulavirales was likely shaped by multiple aquatic-terrestrial transitions and substantial host jumps, some of which are still observable today.
Collapse
Affiliation(s)
- Mary E. Petrone
- Sydney Institute for Infectious Diseases, School of Medical Sciences, The University of Sydney, Sydney, NSW2006, Australia
- Laboratory of Data Discovery for Health Limited, Hong Kong Special Administrative Region, China
| | - Rhys Parry
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD4067, Australia
| | - Jonathon C. O. Mifsud
- Sydney Institute for Infectious Diseases, School of Medical Sciences, The University of Sydney, Sydney, NSW2006, Australia
| | - Kate Van Brussel
- Sydney Institute for Infectious Diseases, School of Medical Sciences, The University of Sydney, Sydney, NSW2006, Australia
| | - Ian Vorhees
- James A. Baker Institute for Animal Health, Department of Microbiology and Immunology, College of Veterinary Medicine, Cornell University, Ithaca, NY14850
| | - Zoe T. Richards
- Coral Conservation and Research Group, Trace and Environmental DNA Laboratory, School of Molecular and Life Sciences, Curtin University, Perth, WA6102, Australia
- Collections and Research, Western Australian Museum, Welshpool, WA6106, Australia
| | - Edward C. Holmes
- Sydney Institute for Infectious Diseases, School of Medical Sciences, The University of Sydney, Sydney, NSW2006, Australia
- Laboratory of Data Discovery for Health Limited, Hong Kong Special Administrative Region, China
| |
Collapse
|
3
|
Tseng KK, Koehler H, Becker DJ, Gibb R, Carlson CJ, Fernandez MDP, Seifert SN. Viral genomic features predict orthopoxvirus reservoir hosts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.26.564211. [PMID: 37961540 PMCID: PMC10634857 DOI: 10.1101/2023.10.26.564211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Orthopoxviruses (OPVs), including the causative agents of smallpox and mpox have led to devastating outbreaks in human populations worldwide. However, the discontinuation of smallpox vaccination, which also provides cross-protection against related OPVs, has diminished global immunity to OPVs more broadly. We apply machine learning models incorporating both host ecological and viral genomic features to predict likely reservoirs of OPVs. We demonstrate that incorporating viral genomic features in addition to host ecological traits enhanced the accuracy of potential OPV host predictions, highlighting the importance of host-virus molecular interactions in predicting potential host species. We identify hotspots for geographic regions rich with potential OPV hosts in parts of southeast Asia, equatorial Africa, and the Amazon, revealing high overlap between regions predicted to have a high number of potential OPV host species and those with the lowest smallpox vaccination coverage, indicating a heightened risk for the emergence or establishment of zoonotic OPVs. Our findings can be used to target wildlife surveillance, particularly related to concerns about mpox establishment beyond its historical range.
Collapse
Affiliation(s)
- Katie K. Tseng
- Paul G. Allen School for Global Health, Washington State University, Pullman, WA, USA
| | - Heather Koehler
- School of Molecular Biosciences, Washington State University, Pullman, WA, USA
| | - Daniel J. Becker
- Department of Biology, School of Biological Sciences, University of Oklahoma, Norman, OK, USA
| | - Rory Gibb
- Centre for Biodiversity and Environment Research, Department of Genetics, Evolution and Environment, University College London, London, UK
- People & Nature Lab, UCL East, University College London, Stratford, London, UK
| | - Colin J. Carlson
- Center for Global Health Science and Security, Georgetown University, Washington, DC, USA
| | | | - Stephanie N. Seifert
- Paul G. Allen School for Global Health, Washington State University, Pullman, WA, USA
| |
Collapse
|
4
|
Mollentze N, Streicker DG. Predicting zoonotic potential of viruses: where are we? Curr Opin Virol 2023; 61:101346. [PMID: 37515983 DOI: 10.1016/j.coviro.2023.101346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/28/2023] [Accepted: 06/30/2023] [Indexed: 07/31/2023]
Abstract
The prospect of identifying high-risk viruses and designing interventions to pre-empt their emergence into human populations is enticing, but controversial, particularly when used to justify large-scale virus discovery initiatives. We review the current state of these efforts, identifying three broad classes of predictive models that have differences in data inputs that define their potential utility for triaging newly discovered viruses for further investigation. Prospects for model predictions of public health risk to guide preparedness depend not only on computational improvements to algorithms, but also on more efficient data generation in laboratory, field and clinical settings. Beyond public health applications, efforts to predict zoonoses provide unique research value by creating generalisable understanding of the ecological and evolutionary factors that promote viral emergence.
Collapse
Affiliation(s)
- Nardus Mollentze
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow G12 8QQ, United Kingdom; MRC-University of Glasgow Centre for Virus Research, G61 1QH, United Kingdom
| | - Daniel G Streicker
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow G12 8QQ, United Kingdom; MRC-University of Glasgow Centre for Virus Research, G61 1QH, United Kingdom.
| |
Collapse
|
5
|
Jiang S, Zhang S, Kang X, Feng Y, Li Y, Nie M, Li Y, Chen Y, Zhao S, Jiang T, Li J. Risk Assessment of the Possible Intermediate Host Role of Pigs for Coronaviruses with a Deep Learning Predictor. Viruses 2023; 15:1556. [PMID: 37515242 PMCID: PMC10384923 DOI: 10.3390/v15071556] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 07/13/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
Swine coronaviruses (CoVs) have been found to cause infection in humans, suggesting that Suiformes might be potential intermediate hosts in CoV transmission from their natural hosts to humans. The present study aims to establish convolutional neural network (CNN) models to predict host adaptation of swine CoVs. Decomposing of each ORF1ab and Spike sequence was performed with dinucleotide composition representation (DCR) and other traits. The relationship between CoVs from different adaptive hosts was analyzed by unsupervised learning, and CNN models based on DCR of ORF1ab and Spike were built to predict the host adaptation of swine CoVs. The rationality of the models was verified with phylogenetic analysis. Unsupervised learning showed that there is a multiple host adaptation of different swine CoVs. According to the adaptation prediction of CNN models, swine acute diarrhea syndrome CoV (SADS-CoV) and porcine epidemic diarrhea virus (PEDV) are adapted to Chiroptera, swine transmissible gastroenteritis virus (TGEV) is adapted to Carnivora, porcine hemagglutinating encephalomyelitis (PHEV) might be adapted to Primate, Rodent, and Lagomorpha, and porcine deltacoronavirus (PDCoV) might be adapted to Chiroptera, Artiodactyla, and Carnivora. In summary, the DCR trait has been confirmed to be representative for the CoV genome, and the DCR-based deep learning model works well to assess the adaptation of swine CoVs to other mammals. Suiformes might be intermediate hosts for human CoVs and other mammalian CoVs. The present study provides a novel approach to assess the risk of adaptation and transmission to humans and other mammals of swine CoVs.
Collapse
Affiliation(s)
- Shuyang Jiang
- College of Mathematics, Jilin University, Changchun, Jilin 130012, China
| | - Sen Zhang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| | - Xiaoping Kang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| | - Ye Feng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| | - Yadan Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| | - Maoshun Nie
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| | - Yuchang Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| | - Yuehong Chen
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| | - Shishun Zhao
- College of Mathematics, Jilin University, Changchun, Jilin 130012, China
| | - Tao Jiang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| | - Jing Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, AMMS, Beijing 100071, China
| |
Collapse
|
6
|
Tran H, Friendship R, Poljak Z. Classification of group A rotavirus VP7 and VP4 genotypes using random forest. Front Genet 2023; 14:1029185. [PMID: 37323680 PMCID: PMC10267748 DOI: 10.3389/fgene.2023.1029185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 05/15/2023] [Indexed: 06/17/2023] Open
Abstract
Introduction: Group A rotaviruses are major pathogens in causing severe diarrhea in young children and neonates of many different species of animals worldwide and group A rotavirus sequence data are becoming increasingly available over time. Different methods exist that allow for rotavirus genotyping, but machine learning methods have yet to be explored. Usage of machine learning algorithms such as random forest alongside alignment-based methodology may allow for both efficient and accurate classification of circulating rotavirus genotypes through the dual classification system. Methods: Random forest models were trained on positional features obtained from pairwise and multiple sequence alignment and cross-validated using methods of repeated 10-fold cross-validation thrice and leave one- out cross validation. Models were then validated on unseen data from the testing datasets to observe real-world performance. Results: All models were found to perform strongly in classification of VP7 and VP4 genotypes with high overall accuracy and kappa values during model training (0.975-0.992, 0.970-0.989) and during model testing (0.972-0.996, 0.969-0.996), respectively. Models trained on multiple sequence alignment generally had slightly higher overall accuracy and kappa values than models trained on pairwise sequence alignment method. In contrast, pairwise sequence alignment models were found to be generally faster than multiple sequence alignment models in computational speed when models do not need to be retrained. Models that used repeated 10-fold cross-validation thrice were also found to be much faster in model computational speed than models that used leave-one-out cross validation, with no noticeable difference in overall accuracy and kappa values between the cross-validation methods. Discussion: Overall, random forest models showed strong performance in the classification of both group A rotavirus VP7 and VP4 genotypes. Application of these models as classifiers will allow for rapid and accurate classification of the increasing amounts of rotavirus sequence data that are becoming available.
Collapse
|
7
|
Iuchi H, Kawasaki J, Kubo K, Fukunaga T, Hokao K, Yokoyama G, Ichinose A, Suga K, Hamada M. Bioinformatics approaches for unveiling virus-host interactions. Comput Struct Biotechnol J 2023; 21:1774-1784. [PMID: 36874163 PMCID: PMC9969756 DOI: 10.1016/j.csbj.2023.02.044] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Junna Kawasaki
- Faculty of Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Nishi Waseda, Shinjuku-ku, Tokyo 169-0051, Japan
| | - Koki Hokao
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Gentaro Yokoyama
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Akiko Ichinose
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kanta Suga
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
8
|
Artificial Intelligence Models for Zoonotic Pathogens: A Survey. Microorganisms 2022; 10:microorganisms10101911. [PMID: 36296187 PMCID: PMC9607465 DOI: 10.3390/microorganisms10101911] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 09/19/2022] [Accepted: 09/22/2022] [Indexed: 11/22/2022] Open
Abstract
Zoonotic diseases or zoonoses are infections due to the natural transmission of pathogens between species (animals and humans). More than 70% of emerging infectious diseases are attributed to animal origin. Artificial Intelligence (AI) models have been used for studying zoonotic pathogens and the factors that contribute to their spread. The aim of this literature survey is to synthesize and analyze machine learning, and deep learning approaches applied to study zoonotic diseases to understand predictive models to help researchers identify the risk factors, and develop mitigation strategies. Based on our survey findings, machine learning and deep learning are commonly used for the prediction of both foodborne and zoonotic pathogens as well as the factors associated with the presence of the pathogens.
Collapse
|
9
|
Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics 2022; 38:ii168-ii174. [PMID: 36124807 DOI: 10.1093/bioinformatics/btac495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Emerging pathogens are a growing threat, but large data collections and approaches for predicting the risk associated with novel agents are limited to bacteria and viruses. Pathogenic fungi, which also pose a constant threat to public health, remain understudied. Relevant data remain comparatively scarce and scattered among many different sources, hindering the development of sequencing-based detection workflows for novel fungal pathogens. No prediction method working for agents across all three groups is available, even though the cause of an infection is often difficult to identify from symptoms alone. RESULTS We present a curated collection of fungal host range data, comprising records on human, animal and plant pathogens, as well as other plant-associated fungi, linked to publicly available genomes. We show that it can be used to predict the pathogenic potential of novel fungal species directly from DNA sequences with either sequence homology or deep learning. We develop learned, numerical representations of the collected genomes and visualize the landscape of fungal pathogenicity. Finally, we train multi-class models predicting if next-generation sequencing reads originate from novel fungal, bacterial or viral threats. CONCLUSIONS The neural networks trained using our data collection enable accurate detection of novel fungal pathogens. A curated set of over 1400 genomes with host and pathogenicity metadata supports training of machine-learning models and sequence comparison, not limited to the pathogen detection task. AVAILABILITY AND IMPLEMENTATION The data, models and code are hosted at https://zenodo.org/record/5846345, https://zenodo.org/record/5711877 and https://gitlab.com/dacs-hpi/deepac. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Ferdous Nasri
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Melania Nowicka
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
10
|
Amerifar S, Norouzi M, Ghandi M. A tool for feature extraction from biological sequences. Brief Bioinform 2022; 23:6563937. [PMID: 35383372 DOI: 10.1093/bib/bbac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 11/12/2022] Open
Abstract
With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, however, are practical only when the sequences are converted into feature vectors. Many tools target this task including iLearnPlus, a Python-based tool which supports a rich set of features. In this paper, we propose a holistic tool that extracts features from biological sequences (i.e. DNA, RNA and Protein). These features are the inputs to machine learning models that predict properties, structures or functions of the input sequences. Our tool not only supports all features in iLearnPlus but also 30 additional features which exist in the literature. Moreover, our tool is based on R language which makes an alternative for bioinformaticians to transform sequences into feature vectors. We have compared the conversion time of our tool with that of iLearnPlus: we transform the sequences much faster. We convert small nucleotides by a median of 2.8X faster, while we outperform iLearnPlus by a median of 6.3X for large sequences. Finally, in amino acids, our tool achieves a median speedup of 23.9X.
Collapse
Affiliation(s)
- Sare Amerifar
- Bioinformatics, Tatbiat Modares University, Jalal Al Ahmad, 14115-111, Tehran, Iran
| | - Mahammad Norouzi
- Computer Science, Technical University of Darmstadt, Hochschulstr. 1, 64293, Hesse, Germany
| | - Mahmoud Ghandi
- Bioinformatics, Monte Rosa Therapeutics, Summer Street, 02210, Boston, United States
| |
Collapse
|
11
|
Fagre AC, Cohen LE, Eskew EA, Farrell M, Glennon E, Joseph MB, Frank HK, Ryan SJ, Carlson CJ, Albery GF. Assessing the risk of human-to-wildlife pathogen transmission for conservation and public health. Ecol Lett 2022; 25:1534-1549. [PMID: 35318793 PMCID: PMC9313783 DOI: 10.1111/ele.14003] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 02/22/2022] [Accepted: 03/02/2022] [Indexed: 12/16/2022]
Abstract
The SARS‐CoV‐2 pandemic has led to increased concern over transmission of pathogens from humans to animals, and its potential to threaten conservation and public health. To assess this threat, we reviewed published evidence of human‐to‐wildlife transmission events, with a focus on how such events could threaten animal and human health. We identified 97 verified examples, involving a wide range of pathogens; however, reported hosts were mostly non‐human primates or large, long‐lived captive animals. Relatively few documented examples resulted in morbidity and mortality, and very few led to maintenance of a human pathogen in a new reservoir or subsequent “secondary spillover” back into humans. We discuss limitations in the literature surrounding these phenomena, including strong evidence of sampling bias towards non‐human primates and human‐proximate mammals and the possibility of systematic bias against reporting human parasites in wildlife, both of which limit our ability to assess the risk of human‐to‐wildlife pathogen transmission. We outline how researchers can collect experimental and observational evidence that will expand our capacity for risk assessment for human‐to‐wildlife pathogen transmission.
Collapse
Affiliation(s)
- Anna C Fagre
- Department of Microbiology, Immunology, and Pathology, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, Colorado, USA.,Bat Health Foundation, Fort Collins, Colorado, USA
| | - Lily E Cohen
- Icahn School of Medicine at Mount Sinai, New York, New York City, USA
| | - Evan A Eskew
- Department of Biology, Pacific Lutheran University, Tacoma, Washington, USA
| | - Max Farrell
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Emma Glennon
- Disease Dynamics Unit, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Maxwell B Joseph
- Earth Lab, University of Colorado Boulder, Boulder, Colorado, USA
| | - Hannah K Frank
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, Louisina, USA
| | - Sadie J Ryan
- Quantitative Disease Ecology and Conservation (QDEC) Lab Group, Department of Geography, University of Florida, Gainesville, Florida, USA.,Emerging Pathogens Institute, University of Florida, Gainesville, Florida, USA.,School of Life Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Colin J Carlson
- Center for Global Health Science and Security, Georgetown University Medical Center, Washington, District of Columbia, USA.,Department of Microbiology and Immunology, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - Gregory F Albery
- Department of Biology, Georgetown University, Washington, District of Columbia, USA
| |
Collapse
|
12
|
Yerukala Sathipati S, Shukla SK, Ho SY. Tracking the amino acid changes of spike proteins across diverse host species of severe acute respiratory syndrome coronavirus 2. iScience 2022; 25:103560. [PMID: 34877480 PMCID: PMC8638202 DOI: 10.1016/j.isci.2021.103560] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 11/02/2021] [Accepted: 11/30/2021] [Indexed: 12/14/2022] Open
Abstract
Knowledge of the host-specific properties of the spike protein is of crucial importance to understand the adaptability of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) to infect multiple species and alter transmissibility, particularly in humans. Here, we propose a spike protein predictor SPIKES incorporating with an inheritable bi-objective combinatorial genetic algorithm to identify the biochemical properties of spike proteins and determine their specificity to human hosts. SPIKES identified 20 informative physicochemical properties of the spike protein, including information measures for alpha helix and relative mutability, and amino acid and dipeptide compositions, which have shown compositional difference at the amino acid sequence level between human and diverse animal coronaviruses. We suggest that alterations of these amino acids between human and animal coronaviruses may provide insights into the development and transmission of SARS-CoV-2 in human and other species and support the discovery of targeted antiviral therapies. Differences exist in the amino acids within the S protein of diverse host species CoVs We developed SPIKES to identify informative properties of S protein SARS-CoV-2 variants have amino acid changes that alter infection and transmission The SPIKES identified changes in S protein properties from animal to human host CoVs
Collapse
Affiliation(s)
- Srinivasulu Yerukala Sathipati
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, WI 54449, USA
- Corresponding author
| | - Sanjay K. Shukla
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, WI 54449, USA
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Center for intelligent Drug Systems and Smart Bio-Devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
13
|
Huang S, Farrell M, Stephens PR. Infectious disease macroecology: parasite diversity and dynamics across the globe. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200350. [PMID: 34538145 PMCID: PMC8450632 DOI: 10.1098/rstb.2020.0350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2021] [Indexed: 11/12/2022] Open
Affiliation(s)
- Shan Huang
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt am Main, Germany
| | - Maxwell Farrell
- Ecology and Evolutionary Biology, University Toronto, Toronto, Ontario, Canada
| | - Patrick R. Stephens
- Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia, Athens, GA, USA
| |
Collapse
|