1
|
Hamzaoui Z, Ferjani S, Medini I, Charaa L, Landolsi I, Ben Ali R, Khaled W, Chammam S, Abid S, Kanzari L, Ferjani A, Fakhfakh A, Kebaier D, Bouslah Z, Ben Sassi M, Trabelsi S, Boutiba-Ben Boubaker I. Genomic surveillance of SARS-CoV-2 in North Africa: 4 years of GISAID data sharing. IJID Reg 2024; 11:100356. [PMID: 38655560 PMCID: PMC11035039 DOI: 10.1016/j.ijregi.2024.100356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 03/13/2024] [Accepted: 03/14/2024] [Indexed: 04/26/2024]
Abstract
Objectives This study aimed to construct geographically, temporally, and epidemiologically representative data sets for SARS-CoV-2 in North Africa, focusing on Variants of Concern (VOCs), Variants of Interest (VOIs), and Variants Under Monitoring (VUMs). Methods SARS-CoV-2 genomic sequences and metadata from the EpiCoV database via the Global Initiative on Sharing All Influenza Data platform were analyzed. Data analysis included cases, deaths, demographics, patient status, sequencing technologies, and variant analysis. Results A comprehensive analysis of 10,783 viral genomic sequences from six North African countries revealed notable insights. SARS-CoV-2 sampling methods lack standardization, with a majority of countries lacking clear strategies. Over 59% of analyzed genomes lack essential clinical and demographic metadata, including patient age, sex, underlying health conditions, and clinical outcomes, which are essential for comprehensive genomic analysis and epidemiological studies, as submitted to the Global Initiative on Sharing All Influenza Data. Morocco reported the highest number of confirmed COVID-19 cases (1,272,490), whereas Tunisia leads in reported deaths (29,341), emphasizing regional variations in the pandemic's impact. The GRA clade emerged as predominant in North African countries. The lineage analysis showcased a diversity of 190 lineages in Egypt, 26 in Libya, 121 in Tunisia, 90 in Algeria, 146 in Morocco, and 10 in Mauritania. The temporal dynamics of SARS-CoV-2 variants revealed distinct waves driven by different variants. Conclusions This study contributes valuable insights into the genomic landscape of SARS-CoV-2 in North Africa, highlighting the importance of genomic surveillance in understanding viral dynamics and informing public health strategies.
Collapse
Affiliation(s)
- Zaineb Hamzaoui
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Sana Ferjani
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Ines Medini
- National Center Chalbibelkahia of Pharmacovigilance of Tunis, Laboratory of Clinical Pharmacology, Tunis, Tunisia
| | - Latifa Charaa
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Ichrak Landolsi
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Roua Ben Ali
- National Center Chalbibelkahia of Pharmacovigilance of Tunis, Laboratory of Clinical Pharmacology, Tunis, Tunisia
| | - Wissal Khaled
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Sarra Chammam
- National Center Chalbibelkahia of Pharmacovigilance of Tunis, Laboratory of Clinical Pharmacology, Tunis, Tunisia
| | - Salma Abid
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Lamia Kanzari
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Asma Ferjani
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Ahmed Fakhfakh
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Dhouha Kebaier
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Zoubeir Bouslah
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| | - Mouna Ben Sassi
- National Center Chalbibelkahia of Pharmacovigilance of Tunis, Laboratory of Clinical Pharmacology, Tunis, Tunisia
- University of Tunis El Manar, Faculty of Medicine of Tunis, Tunis Tunisia
| | - Sameh Trabelsi
- National Center Chalbibelkahia of Pharmacovigilance of Tunis, Laboratory of Clinical Pharmacology, Tunis, Tunisia
- University of Tunis El Manar, Faculty of Medicine of Tunis, Tunis Tunisia
| | - Ilhem Boutiba-Ben Boubaker
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Microbiology, Charles Nicolle Hospital, Tunis, Tunisia
| |
Collapse
|
2
|
Schmidt S, Khan S, Alanko JN, Pibiri GE, Tomescu AI. Matchtigs: minimum plain text representation of k-mer sets. Genome Biol 2023; 24:136. [PMID: 37296461 DOI: 10.1186/s13059-023-02968-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 05/10/2023] [Indexed: 06/12/2023] Open
Abstract
We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.
Collapse
Affiliation(s)
- Sebastian Schmidt
- Department of Computer Science, University of Helsinki, Helsinki, Finland.
| | - Shahbaz Khan
- Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, India.
| | - Jarno N Alanko
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
| | - Giulio E Pibiri
- Department of Environmental Sciences, Informatics and Statistics, Ca' Foscari University of Venice, Venice, Italy
- ISTI-CNR, Pisa, Italy
| | | |
Collapse
|
3
|
Behboudi R, Nouri-Baygi M, Naghibzadeh M. RPTRF: A rapid perfect tandem repeat finder tool for DNA sequences. Biosystems 2023; 226:104869. [PMID: 36858110 DOI: 10.1016/j.biosystems.2023.104869] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/23/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023]
Abstract
The sequencing of eukaryotic genomes has shown that tandem repeats are abundant in their sequences. In addition to affecting some cellular processes, tandem repeats in the genome may be associated with specific diseases and have been the key to resolving criminal cases. Any tool developed for detecting tandem repeats must be accurate, fast, and useable in thousands of laboratories worldwide, including those with not very advanced computing capabilities. The proposed method, the Rapid Perfect Tandem Repeat Finder (RPTRF), minimizes the need for excess character comparison processing by indexing the input file and significantly helps to accelerate and prepare the output without artifacts by using an interval tree in the filtering section. The experiments demonstrated that the RPTRF is very fast in discovering all perfect tandem repeats of all categories of any genomic sequences. Although the detection of imperfect TRs is not the focus of the RPTRF, comparisons show that it even outperforms some other tools (in five selected gold standards) designed explicitly for this purpose. The implemented tool and how to use it are available on GitHub.
Collapse
Affiliation(s)
- Reza Behboudi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mostafa Nouri-Baygi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
4
|
Petoukhov SV. Binary oppositions, algebraic holography and stochastic rules in genetic informatics. Biosystems 2022; 221:104760. [PMID: 36031064 DOI: 10.1016/j.biosystems.2022.104760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 07/23/2022] [Accepted: 08/07/2022] [Indexed: 11/17/2022]
Abstract
The article is devoted to the author's results of the algebraic analysis of molecular genetic systems, including a set of structured DNA alphabets and long nucleotide sequences in single-stranded DNA of eukaryotic and prokaryotic genomes. A connection of the system of DNA n-plets alphabets with principles of algebraic holography is shown, which concerns a popular theme of holography principles in genetically inherited physiology. In addition, a relation between DNA n-plets alphabets and the Poincaré disk model of Lobachevski hyperbolic geometry is revealed. This relation can explain known facts of the relationship of physiological phenomena with hyperbolic geometry. Considering long DNA sequences as a bunch of many parallel texts written in different n-plets alphabets led to the discovery of some universal rules of the stochastic organization of genomic DNAs. These rules are discussed concerning the general problem of the biological dualism "probability-vs-determinism". In general, the presented results give pieces of evidence in favor of the efficiency of a model approach to living organisms as quantum-informational algebraic-harmonic essences.
Collapse
Affiliation(s)
- Sergey V Petoukhov
- Mechanical Engineering Research Institute, Russian Academy of Sciences, Moscow, Russia; Moscow State Tchaikovsky Conservatory, Moscow, Russia.
| |
Collapse
|
5
|
Sarkar JP, Saha I, Seal A, Maity D, Maulik U. Topological Analysis for Sequence Variability: Case Study on more than 2K SARS-CoV-2 sequences of COVID-19 infected 54 countries in comparison with SARS-CoV-1 and MERS-CoV. Infect Genet Evol 2021; 88:104708. [PMID: 33421654 PMCID: PMC7787073 DOI: 10.1016/j.meegid.2021.104708] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 10/27/2020] [Accepted: 12/31/2020] [Indexed: 12/11/2022]
Abstract
The pandemic due to novel coronavirus, SARS-CoV-2 is a serious global concern now. More than thousand new COVID-19 infections are getting reported daily for this virus across the globe. Thus, the medical research communities are trying to find the remedy to restrict the spreading of this virus, while the vaccine development work is still under research in parallel. In such critical situation, not only the medical research community, but also the scientists in different fields like microbiology, pharmacy, bioinformatics and data science are also sharing effort to accelerate the process of vaccine development, virus prediction, forecasting the transmissible probability and reproduction cases of virus for social awareness. With the similar context, in this article, we have studied sequence variability of the virus primarily focusing on three aspects: (a) sequence variability among SARS-CoV-1, MERS-CoV and SARS-CoV-2 in human host, which are in the same coronavirus family, (b) sequence variability of SARS-CoV-2 in human host for 54 different countries and (c) sequence variability between coronavirus family and country specific SARS-CoV-2 sequences in human host. For this purpose, as a case study, we have performed topological analysis of 2391 global genomic sequences of SARS-CoV-2 in association with SARS-CoV-1 and MERS-CoV using an integrated semi-alignment based computational technique. The results of the semi-alignment based technique are experimentally and statistically found similar to alignment based technique and computationally faster. Moreover, the outcome of this analysis can help to identify the nations with homogeneous SARS-CoV-2 sequences, so that same vaccine can be applied to their heterogeneous human population.
Collapse
Affiliation(s)
- Jnanendra Prasad Sarkar
- Larsen & Toubro Infotech Ltd., Pune, Maharashtra, India; Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Indrajit Saha
- Department of Computer Science and Engineering, National Institute of Technical Teachers' Training & Research, Kolkata, West Bengal, India.
| | - Arijit Seal
- Cognizant Technology Solutions, Kolkata, West Bengal, India
| | - Debasree Maity
- Department of Electronics and Communication Engineering, MCKV Institute of Engineering, Howrah, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| |
Collapse
|
6
|
Saini S, Dewan L. Application of discrete wavelet transform for analysis of genomic sequences of Mycobacterium tuberculosis. Springerplus 2016; 5:64. [PMID: 26839757 PMCID: PMC4722049 DOI: 10.1186/s40064-016-1668-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2015] [Accepted: 01/04/2016] [Indexed: 12/04/2022]
Abstract
This paper highlights the potential of discrete wavelet transforms in the analysis and comparison of genomic sequences of Mycobacterium tuberculosis (MTB) with different resistance characteristics. Graphical representations of wavelet coefficients and statistical estimates of their parameters have been used to determine the extent of similarity between different sequences of MTB without the use of conventional methods such as Basic Local Alignment Search Tool. Based on the calculation of the energy of wavelet decomposition coefficients of complete genomic sequences, their broad classification of the type of resistance can be done. All the given genomic sequences can be grouped into two broad categories wherein the drug resistant and drug susceptible sequences form one group while the multidrug resistant and extensive drug resistant sequences form the other group. This method of segregation of the sequences is faster than conventional laboratory methods which require 3–4 weeks of culture of sputum samples. Thus the proposed method can be used as a tool to enhance clinical diagnostic investigations in near real-time.
Collapse
Affiliation(s)
- Shiwani Saini
- Department of Electrical Engineering, National Institute of Technology, Kurukshetra, Haryana 136119 India
| | - Lillie Dewan
- Department of Electrical Engineering, National Institute of Technology, Kurukshetra, Haryana 136119 India
| |
Collapse
|
7
|
Abstract
Complexity measures are used to compare the genomic characteristics of five organisms belonging to distinct classes spanning the evolutionary tree: higher eukaryotes, amoebae, unicellular eukaryotes and bacteria. The comparisons are undertaken using the full four-letter alphabet and the coarse grained two-letter alphabets AG-CT and AT-CG. We show that the conditional probability matrix for the four-letter and AT-CG alphabet is markedly asymmetric in eukaryotes while it is nearly symmetric in bacterial genomes. Spatial asymmetry is revealed in the four-letter alphabet, signifying that the probability fluxes are nonvanishing and thus the reading sense of a sequence is irreversible for all organisms. Calculations of the block entropy and excess entropy demonstrate that the human genome accommodates better all possible block configurations, especially for long blocks. With respect to point-to-point details and to spatial arrangement of blocks the exit distance distributions from a particular letter demonstrate long distance characteristics in the eukaryotic sequences for all three alphabets, while the bacterial (prokaryotic) genomes deviate indicating short range characteristics. Overall, the conditional probability, the fluxes, the block entropy content and the exit distance distributions can be used as markers, discriminating between eukaryotic and prokaryotic DNA, allowing in many cases to discern details related to finer classes. In all cases the reduction from four letters to two masks some important statistical and spatial properties, with the AT-CG alphabet having higher ability of discrimination than the AG-CT one. In particular, the AT-CG alphabet reduction accentuates the CpG related properties (conditional probabilities w32, long ranged exit distance distribution for A and T nucleotides), but masks sequence asymmetry and irreversibility in all examined organisms.
Collapse
Affiliation(s)
- A Provata
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", 15310 Athens, Greece.
| | - C Nicolis
- Institut Royal Météorogique de Belgique, 3 Avenue Circulaire, 1180 Bruxelles, Belgium.
| | - G Nicolis
- Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Campus Plaine, C.P. 231, 1050 Bruxelles, Belgium.
| |
Collapse
|