1
|
Dantas CWD, da Costa Neto SR, Alves SIA, da Costa Pinheiro K, De Los Santos EFF, Ramos RTJ. SATIN: a micro and mini satellite mining tool of total genome and coding regions with analysis of perfect repeats polymorphism in coding regions. BMC Bioinformatics 2024; 25:217. [PMID: 38890569 PMCID: PMC11186120 DOI: 10.1186/s12859-024-05842-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND Tandem repeats are specific sequences in genomic DNA repeated in tandem that are present in all organisms. Among the subcategories of TRs we have Satellite repeats, that is divided into macrosatellites, minisatellites, and microsatellites, being the last two of specific interest because they can identify polymorphisms between organisms due to their instability. Currently, most mining tools focus on Simple Sequence Repeats (SSR) mining, and only a few can identify SSRs in the coding regions. RESULTS We developed a microsatellite mining software called SATIN (Micro and Mini SATellite IdentificatioN tool) based on a new sliding window algorithm written in C and Python. It represents a new approach to SSR mining by addressing the limitations of existing tools, particularly in coding region SSR mining. SATIN is available at https://github.com/labgm/SATIN.git . It was shown to be the second fastest for perfect and compound SSR mining. It can identify SSRs from coding regions plus SSRs with motif sizes bigger than 6. Besides the SSR mining, SATIN can also analyze SSRs polymorphism on coding-regions from pre-determined groups, and identify SSRs differentially abundant among them on a per-gene basis. To validate, we analyzed SSRs from two groups of Escherichia coli (K12 and O157) and compared the results with 5 known SSRs from coding regions. SATIN identified all 5 SSRs from 237 genes with at least one SSR on it. CONCLUSIONS The SATIN is a novel microsatellite search software that utilizes an innovative sliding window technique based on a numerical list for repeat region search to identify perfect, and composite SSRs while generating comprehensible and analyzable outputs. It is a tool capable of using files in fasta or GenBank format as input for microsatellite mining, also being able to identify SSRs present in coding regions for GenBank files. In conclusion, we expect SATIN to help identify potential SSRs to be used as genetic markers.
Collapse
Affiliation(s)
| | | | - Sandy Ingrid Aguiar Alves
- Simulation and Computational Biology Laboratory, High Performance Computing Center, Federal University of Pará, Belém, Brazil
| | - Kenny da Costa Pinheiro
- Simulation and Computational Biology Laboratory, High Performance Computing Center, Federal University of Pará, Belém, Brazil
| | | | - Rommel Thiago Jucá Ramos
- Simulation and Computational Biology Laboratory, High Performance Computing Center, Federal University of Pará, Belém, Brazil.
| |
Collapse
|
2
|
Avellaneda LL, Johnson DT, Gutierrez R, Thompson L, Sage KA, Sturm SA, Houston RM, LaRue BL. Development of a novel five-dye panel for human identification insertion/deletion (INDEL) polymorphisms. J Forensic Sci 2024; 69:814-824. [PMID: 38291825 DOI: 10.1111/1556-4029.15475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 01/15/2024] [Accepted: 01/18/2024] [Indexed: 02/01/2024]
Abstract
DNA analysis of forensic case samples relies on short tandem repeats (STRs), a key component of the combined DNA index system (CODIS) used to identify individuals. However, limitations arise when dealing with challenging samples, prompting the exploration of alternative markers such as single nucleotide polymorphisms (SNPs) and insertion/deletion (INDELs) polymorphisms. Unlike SNPs, INDELs can be differentiated easily by size, making them compatible with electrophoresis methods. It is possible to design small INDEL amplicons (<200 bp) to enhance recovery from degraded samples. To this end, a set of INDEL Human Identification Markers (HID) was curated from the 1000 Genomes Project, employing criteria including a fixation index (FST) ≤ 0.06, minor allele frequency (MAF) >0.2, and high allele frequency divergence. A panel of 33 INDEL-HIDs was optimized and validated following the Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines, utilizing a five-dye multiplex electrophoresis system. A small sample set (n = 79 unrelated individuals) was genotyped to assess the assay's performance. The validation studies exhibited reproducibility, inhibition tolerance, ability to detect a two-person mixture from a 4:1 to 1:6 ratio, robustness with challenging samples, and sensitivity down to 125 pg of DNA. In summary, the 33-loci INDEL-HID panel exhibited robust recovery with low-template and degraded samples and proved effective for individualization within a small sample set.
Collapse
Affiliation(s)
- Lucio L Avellaneda
- Department of Forensic Science, Sam Houston State University, Huntsville, Texas, USA
| | - Damani T Johnson
- Department of Forensic Science, Sam Houston State University, Huntsville, Texas, USA
| | - Ryan Gutierrez
- Department of Forensic Science, Sam Houston State University, Huntsville, Texas, USA
| | - Lindsey Thompson
- Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, Fort Worth, Texas, USA
| | - Kelly A Sage
- Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, Fort Worth, Texas, USA
| | - Sarah A Sturm
- Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, Fort Worth, Texas, USA
| | - Rachel M Houston
- Department of Forensic Science, Sam Houston State University, Huntsville, Texas, USA
| | - Bobby L LaRue
- Department of Forensic Science, Sam Houston State University, Huntsville, Texas, USA
- Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, Fort Worth, Texas, USA
| |
Collapse
|
3
|
Pires GP, Fioresi VS, Canal D, Canal DC, Fernandes M, Brustolini OJB, de Avelar Carpinetti P, Ferreira A, da Silva Ferreira MF. Effects of trimer repeats on Psidium guajava L. gene expression and prospection of functional microsatellite markers. Sci Rep 2024; 14:9811. [PMID: 38684872 PMCID: PMC11059378 DOI: 10.1038/s41598-024-60417-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 04/23/2024] [Indexed: 05/02/2024] Open
Abstract
Most research on trinucleotide repeats (TRs) focuses on human diseases, with few on the impact of TR expansions on plant gene expression. This work investigates TRs' effect on global gene expression in Psidium guajava L., a plant species with widespread distribution and significant relevance in the food, pharmacology, and economics sectors. We analyzed TR-containing coding sequences in 1,107 transcripts from 2,256 genes across root, shoot, young leaf, old leaf, and flower bud tissues of the Brazilian guava cultivars Cortibel RM and Paluma. Structural analysis revealed TR sequences with small repeat numbers (5-9) starting with cytosine or guanine or containing these bases. Functional annotation indicated TR-containing genes' involvement in cellular structures and processes (especially cell membranes and signal recognition), stress response, and resistance. Gene expression analysis showed significant variation, with a subset of highly expressed genes in both cultivars. Differential expression highlighted numerous down-regulated genes in Cortibel RM tissues, but not in Paluma, suggesting interplay between tissues and cultivars. Among 72 differentially expressed genes with TRs, 24 form miRNAs, 13 encode transcription factors, and 11 are associated with transposable elements. In addition, a set of 20 SSR-annotated, transcribed, and differentially expressed genes with TRs was selected as phenotypic markers for Psidium guajava and, potentially for closely related species as well.
Collapse
Affiliation(s)
- Giovanna Pinto Pires
- Centro de Ciências Agrárias e Engenharias, Departamento de Agronomia, Universidade Federal Do Espírito Santo, Alto Universitário, s/n, Alegre, ES, 29500-000, Brazil
| | - Vinicius Sartori Fioresi
- Centro de Ciências Agrárias e Engenharias, Departamento de Agronomia, Universidade Federal Do Espírito Santo, Alto Universitário, s/n, Alegre, ES, 29500-000, Brazil
| | - Drielli Canal
- Centro de Ciências Agrárias e Engenharias, Departamento de Agronomia, Universidade Federal Do Espírito Santo, Alto Universitário, s/n, Alegre, ES, 29500-000, Brazil
| | - Dener Cezati Canal
- Centro de Ciências Agrárias e Engenharias, Departamento de Agronomia, Universidade Federal Do Espírito Santo, Alto Universitário, s/n, Alegre, ES, 29500-000, Brazil
| | - Miquéias Fernandes
- Centro de Ciências Agrárias e Engenharias, Departamento de Agronomia, Universidade Federal Do Espírito Santo, Alto Universitário, s/n, Alegre, ES, 29500-000, Brazil
| | - Otávio José Bernardes Brustolini
- Laboratório Nacional de Computação Científica (LNCC). Av. Getulio Vargas, 333, Petrópolis, Rio de Janeiro, Quitandinha, 25651-076, Brazil
| | - Paola de Avelar Carpinetti
- Centro de Ciências Agrárias e Engenharias, Departamento de Agronomia, Universidade Federal Do Espírito Santo, Alto Universitário, s/n, Alegre, ES, 29500-000, Brazil
| | - Adésio Ferreira
- Centro de Ciências Agrárias e Engenharias, Departamento de Agronomia, Universidade Federal Do Espírito Santo, Alto Universitário, s/n, Alegre, ES, 29500-000, Brazil
| | - Marcia Flores da Silva Ferreira
- Centro de Ciências Agrárias e Engenharias, Departamento de Agronomia, Universidade Federal Do Espírito Santo, Alto Universitário, s/n, Alegre, ES, 29500-000, Brazil.
| |
Collapse
|
4
|
Baril T, Galbraith J, Hayward A. Earl Grey: A Fully Automated User-Friendly Transposable Element Annotation and Analysis Pipeline. Mol Biol Evol 2024; 41:msae068. [PMID: 38577785 PMCID: PMC11003543 DOI: 10.1093/molbev/msae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/20/2024] [Accepted: 03/22/2024] [Indexed: 04/06/2024] Open
Abstract
Transposable elements (TEs) are major components of eukaryotic genomes and are implicated in a range of evolutionary processes. Yet, TE annotation and characterization remain challenging, particularly for nonspecialists, since existing pipelines are typically complicated to install, run, and extract data from. Current methods of automated TE annotation are also subject to issues that reduce overall quality, particularly (i) fragmented and overlapping TE annotations, leading to erroneous estimates of TE count and coverage, and (ii) repeat models represented by short sections of total TE length, with poor capture of 5' and 3' ends. To address these issues, we present Earl Grey, a fully automated TE annotation pipeline designed for user-friendly curation and annotation of TEs in eukaryotic genome assemblies. Using nine simulated genomes and an annotation of Drosophila melanogaster, we show that Earl Grey outperforms current widely used TE annotation methodologies in ameliorating the issues mentioned above while scoring highly in benchmarking for TE annotation and classification and being robust across genomic contexts. Earl Grey provides a comprehensive and fully automated TE annotation toolkit that provides researchers with paper-ready summary figures and outputs in standard formats compatible with other bioinformatics tools. Earl Grey has a modular format, with great scope for the inclusion of additional modules focused on further quality control and tailored analyses in future releases.
Collapse
Affiliation(s)
- Tobias Baril
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Cornwall TR10 9FE, UK
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, 2000 Neuchâtel, Switzerland
| | - James Galbraith
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Cornwall TR10 9FE, UK
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Alex Hayward
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Cornwall TR10 9FE, UK
| |
Collapse
|
5
|
Mokhtar MM, Alsamman AM, El Allali A. MegaSSR: a web server for large scale microsatellite identification, classification, and marker development. FRONTIERS IN PLANT SCIENCE 2023; 14:1219055. [PMID: 38162302 PMCID: PMC10757629 DOI: 10.3389/fpls.2023.1219055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 08/18/2023] [Indexed: 01/03/2024]
Abstract
Next-generation sequencing technologies have opened new avenues for using genomic data to study and develop molecular markers and improve genetic resources. Simple Sequence Repeats (SSRs) as genetic markers are increasingly used in molecular diversity and molecular breeding programs that require bioinformatics pipelines to analyze the large amounts of data. Therefore, there is an ongoing need for online tools that provide computational resources with minimal effort and maximum efficiency, including automated development of SSR markers. These tools should be flexible, customizable, and able to handle the ever-increasing amount of genomic data. Here we introduce MegaSSR (https://bioinformatics.um6p.ma/MegaSSR), a web server and a standalone pipeline that enables the design of SSR markers in any target genome. MegaSSR allows users to design targeted PCR-based primers for their selected SSR repeats and includes multiple tools that initiate computational pipelines for SSR mining, classification, comparisons, PCR primer design, in silico PCR validation, and statistical visualization. MegaSSR results can be accessed, searched, downloaded, and visualized with user-friendly web-based tools. These tools provide graphs and tables showing various aspects of SSR markers and corresponding PCR primers. MegaSSR will accelerate ongoing research in plant species and assist breeding programs in their efforts to improve current genomic resources.
Collapse
Affiliation(s)
- Morad M. Mokhtar
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Benguerir, Morocco
- Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza, Egypt
| | - Alsamman M. Alsamman
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Benguerir, Morocco
- Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza, Egypt
- Biotechnology Department, International Center for Agricultural Research in the Dry Areas (ICARDA), Giza, Egypt
| | - Achraf El Allali
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Benguerir, Morocco
| |
Collapse
|
6
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
7
|
Nuss AB, Lomas JS, Reyes JB, Garcia-Cruz O, Lei W, Sharma A, Pham MN, Beniwal S, Swain ML, McVicar M, Hinne IA, Zhang X, Yim WC, Gulia-Nuss M. The highly improved genome of Ixodes scapularis with X and Y pseudochromosomes. Life Sci Alliance 2023; 6:e202302109. [PMID: 37813487 PMCID: PMC10561763 DOI: 10.26508/lsa.202302109] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 09/21/2023] [Accepted: 09/22/2023] [Indexed: 10/12/2023] Open
Abstract
Ixodes scapularis, the black-legged tick, is the principal vector of the Lyme disease spirochete, Borrelia burgdorferi, and is responsible for most of the ∼470,000 estimated Lyme disease cases annually in the USA. Ixodes scapularis can transmit six additional pathogens of human health significance. Because of its medical importance, I. scapularis was the first tick genome to be sequenced and annotated. However, the first assembly, I. scapularis Wikel (IscaW), was highly fragmented because of the technical challenges posed by the long, repetitive genome sequences characteristic of arthropod genomes and the lack of long-read sequencing techniques. Although I. scapularis has emerged as a model for tick research because of the availability of new tools such as embryo injection and CRISPR-Cas9-mediated gene editing yet the lack of chromosome-scale scaffolds has slowed progress in tick biology and the development of tools for their control. Here we combine diverse technologies to produce the I. scapularis Gulia-Nuss (IscGN) genome assembly and gene set. We used DNA from eggs and male and female adult ticks and took advantage of Hi-C, PacBio HiFi sequencing, and Illumina short-read sequencing technologies to produce a chromosome-level assembly. In this work, we present the predicted pseudochromosomes consisting of 13 autosomes and the sex pseudochromosomes: X and Y, and a markedly improved genome annotation compared with the existing assemblies and annotations.
Collapse
Affiliation(s)
- Andrew B Nuss
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
- https://ror.org/01keh0577 Department of Agriculture, Veterinary, and Rangeland Sciences, The University of Nevada, Reno, NV, USA
| | - Johnathan S Lomas
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Jeremiah B Reyes
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
- https://ror.org/01keh0577 Nevada Bioinformatics Center, University of Nevada, Reno, NV, USA
| | - Omar Garcia-Cruz
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Wenlong Lei
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Arvind Sharma
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Michael N Pham
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Saransh Beniwal
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
- https://ror.org/01keh0577 Department of Computer Science and Engineering, The University of Nevada, Reno, NV, USA
| | - Mia L Swain
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Molly McVicar
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Isaac Amankona Hinne
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Xingtan Zhang
- https://ror.org/01keh0577 Nevada Bioinformatics Center, University of Nevada, Reno, NV, USA
| | - Won C Yim
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| | - Monika Gulia-Nuss
- https://ror.org/01keh0577 Department of Biochemistry and Molecular Biology, The University of Nevada, Reno, NV, USA
| |
Collapse
|
8
|
Orlov YL, Orlova NG. Bioinformatics tools for the sequence complexity estimates. Biophys Rev 2023; 15:1367-1378. [PMID: 37974990 PMCID: PMC10643780 DOI: 10.1007/s12551-023-01140-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 11/19/2023] Open
Abstract
We review current methods and bioinformatics tools for the text complexity estimates (information and entropy measures). The search DNA regions with extreme statistical characteristics such as low complexity regions are important for biophysical models of chromosome function and gene transcription regulation in genome scale. We discuss the complexity profiling for segmentation and delineation of genome sequences, search for genome repeats and transposable elements, and applications to next-generation sequencing reads. We review the complexity methods and new applications fields: analysis of mutation hotspots loci, analysis of short sequencing reads with quality control, and alignment-free genome comparisons. The algorithms implementing various numerical measures of text complexity estimates including combinatorial and linguistic measures have been developed before genome sequencing era. The series of tools to estimate sequence complexity use compression approaches, mainly by modification of Lempel-Ziv compression. Most of the tools are available online providing large-scale service for whole genome analysis. Novel machine learning applications for classification of complete genome sequences also include sequence compression and complexity algorithms. We present comparison of the complexity methods on the different sequence sets, the applications for gene transcription regulatory regions analysis. Furthermore, we discuss approaches and application of sequence complexity for proteins. The complexity measures for amino acid sequences could be calculated by the same entropy and compression-based algorithms. But the functional and evolutionary roles of low complexity regions in protein have specific features differing from DNA. The tools for protein sequence complexity aimed for protein structural constraints. It was shown that low complexity regions in protein sequences are conservative in evolution and have important biological and structural functions. Finally, we summarize recent findings in large scale genome complexity comparison and applications for coronavirus genome analysis.
Collapse
Affiliation(s)
- Yuriy L. Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow, 119991 Russia
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198 Moscow, Russia
| | - Nina G. Orlova
- Department of Mathematics, Financial University under the Government of the Russian Federation, Moscow, 125167 Russia
| |
Collapse
|
9
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
10
|
Alves SIA, Ferreira VBC, Dantas CWD, da Silva ALDC, Ramos RTJ. EasySSR: a user-friendly web application with full command-line features for large-scale batch microsatellite mining and samples comparison. Front Genet 2023; 14:1228552. [PMID: 37693309 PMCID: PMC10483286 DOI: 10.3389/fgene.2023.1228552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 07/28/2023] [Indexed: 09/12/2023] Open
Abstract
Microsatellites, also known as SSRs or STRs, are polymorphic DNA regions with tandem repetitions of a nucleotide motif of size 1-6 base pairs with a broad range of applications in many fields, such as comparative genomics, molecular biology, and forensics. However, the majority of researchers do not have computational training and struggle while running command-line tools or very limited web tools for their SSR research, spending a considerable amount of time learning how to execute the software and conducting the post-processing data tabulation in other tools or manually-time that could be used directly in data analysis. We present EasySSR, a user-friendly web tool with command-line full functionality, designed for practical use in batch identifying and comparing SSRs in sequences, draft, or complete genomes, not requiring previous bioinformatic skills to run. EasySSR requires only a FASTA and an optional GENBANK file of one or more genomes to identify and compare STRs. The tool can automatically analyze and compare SSRs in whole genomes, convert GenBank to PTT files, identify perfect and imperfect SSRs and coding and non-coding regions, compare their frequencies, abundancy, motifs, flanking sequences, and iterations, producing many outputs ready for download such as PTT files, interactive charts, and Excel tables, giving the user the data ready for further analysis in minutes. EasySSR was implemented as a web application, which can be executed from any browser and is available for free at https://computationalbiology.ufpa.br/easyssr/. Tutorials, usage notes, and download links to the source code can be found at https://github.com/engbiopct/EasySSR.
Collapse
Affiliation(s)
- Sandy Ingrid Aguiar Alves
- Laboratory of Biological Engineering, Biological Science Institute, Park of Science and Technology, Federal University of Pará, Belém, Brazil
| | - Victor Benedito Costa Ferreira
- Laboratory of Biological Engineering, Biological Science Institute, Park of Science and Technology, Federal University of Pará, Belém, Brazil
| | | | - Artur Luiz da Costa da Silva
- Laboratory of Biological Engineering, Biological Science Institute, Park of Science and Technology, Federal University of Pará, Belém, Brazil
| | - Rommel Thiago Jucá Ramos
- Laboratory of Biological Engineering, Biological Science Institute, Park of Science and Technology, Federal University of Pará, Belém, Brazil
| |
Collapse
|
11
|
Behboudi R, Nouri-Baygi M, Naghibzadeh M. RPTRF: A rapid perfect tandem repeat finder tool for DNA sequences. Biosystems 2023; 226:104869. [PMID: 36858110 DOI: 10.1016/j.biosystems.2023.104869] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/23/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023]
Abstract
The sequencing of eukaryotic genomes has shown that tandem repeats are abundant in their sequences. In addition to affecting some cellular processes, tandem repeats in the genome may be associated with specific diseases and have been the key to resolving criminal cases. Any tool developed for detecting tandem repeats must be accurate, fast, and useable in thousands of laboratories worldwide, including those with not very advanced computing capabilities. The proposed method, the Rapid Perfect Tandem Repeat Finder (RPTRF), minimizes the need for excess character comparison processing by indexing the input file and significantly helps to accelerate and prepare the output without artifacts by using an interval tree in the filtering section. The experiments demonstrated that the RPTRF is very fast in discovering all perfect tandem repeats of all categories of any genomic sequences. Although the detection of imperfect TRs is not the focus of the RPTRF, comparisons show that it even outperforms some other tools (in five selected gold standards) designed explicitly for this purpose. The implemented tool and how to use it are available on GitHub.
Collapse
Affiliation(s)
- Reza Behboudi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mostafa Nouri-Baygi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
12
|
Martin G, Cottin A, Baurens FC, Labadie K, Hervouet C, Salmon F, Paulo-de-la-Reberdiere N, Van den Houwe I, Sardos J, Aury JM, D'Hont A, Yahiaoui N. Interspecific introgression patterns reveal the origins of worldwide cultivated bananas in New Guinea. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 113:802-818. [PMID: 36575919 DOI: 10.1111/tpj.16086] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 12/16/2022] [Accepted: 12/23/2022] [Indexed: 06/17/2023]
Abstract
Hybridizations between Musa species and subspecies, enabled by their transport via human migration, were proposed to have played an important role in banana domestication. We exploited sequencing data of 226 Musaceae accessions, including wild and cultivated accessions, to characterize the inter(sub)specific hybridization pattern that gave rise to cultivated bananas. We identified 11 genetic pools that contributed to cultivars, including two contributors of unknown origin. Informative alleles for each of these genetic pools were pinpointed and used to obtain genome ancestry mosaics of accessions. Diploid and triploid cultivars had genome mosaics involving three up to possibly seven contributors. The simplest mosaics were found for some diploid cultivars from New Guinea, combining three contributors, i.e., banksii and zebrina representing Musa acuminata subspecies and, more unexpectedly, the New Guinean species Musa schizocarpa. Breakpoints of M. schizocarpa introgressions were found to be conserved between New Guinea cultivars and the other analyzed diploid and triploid cultivars. This suggests that plants bearing these M. schizocarpa introgressions were transported from New Guinea and gave rise to currently cultivated bananas. Many cultivars showed contrasted mosaics with predominant ancestry from their geographical origin across Southeast Asia to New Guinea. This revealed that further diversification occurred in different Southeast Asian regions through hybridization with other Musa (sub)species, including two unknown ancestors that we propose to be M. acuminata ssp. halabanensis and a yet to be characterized M. acuminata subspecies. These results highlighted a dynamic crop formation process that was initiated in New Guinea, with subsequent diversification throughout Southeast Asia.
Collapse
Affiliation(s)
- Guillaume Martin
- CIRAD, UMR AGAP Institut, Montpellier, F-34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Aurélien Cottin
- CIRAD, UMR AGAP Institut, Montpellier, F-34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Franc-Christophe Baurens
- CIRAD, UMR AGAP Institut, Montpellier, F-34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Karine Labadie
- Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Catherine Hervouet
- CIRAD, UMR AGAP Institut, Montpellier, F-34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Frédéric Salmon
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, F-97130 Capesterre-Belle-Eau, Guadeloupe, France
| | - Nilda Paulo-de-la-Reberdiere
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, CRB-PT, F-97170 Roujol Petit-Bourg, Guadeloupe, France
| | - Ines Van den Houwe
- Bioversity International, Willem De Croylaan 42, B-3001, Leuven, Belgium
| | - Julie Sardos
- Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France
| | - Angélique D'Hont
- CIRAD, UMR AGAP Institut, Montpellier, F-34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Nabila Yahiaoui
- CIRAD, UMR AGAP Institut, Montpellier, F-34398, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| |
Collapse
|
13
|
Descorps-Declère S, Richard GF. Megasatellite formation and evolution in vertebrate genes. Cell Rep 2022; 40:111347. [PMID: 36103826 DOI: 10.1016/j.celrep.2022.111347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 04/28/2022] [Accepted: 08/23/2022] [Indexed: 11/03/2022] Open
Abstract
Since formation of the first proto-eukaryotes, gene repertoire and genome complexity have significantly increased. Among genetic elements responsible for this increase are tandem repeats. Here we describe a genome-wide analysis of large tandem repeats, called megasatellites, in 58 vertebrate genomes. Two bursts occurred, one after the radiation between Agnatha and Gnathostomata fishes and the second one in therian mammals. Megasatellites are enriched in subtelomeric regions and frequently encoded in genes involved in transcription regulation, intracellular trafficking, and cell membrane metabolism, reminiscent of what is observed in fungus genomes. The presence of many introns within young megasatellites suggests that an exon-intron DNA segment is first duplicated and amplified before accumulation of mutations in intronic parts partially erases the megasatellite in such a way that it becomes detectable only in exons. Our results suggest that megasatellite formation and evolution is a dynamic and still ongoing process in vertebrate genomes.
Collapse
Affiliation(s)
- Stéphane Descorps-Declère
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, 25 rue du Dr Roux, 75015 Paris, France.
| | - Guy-Franck Richard
- Institut Pasteur, Université Paris Cité, CNRS UMR3525, Natural & Synthetic Genome Instabilities, 25 rue du Dr Roux, 75015 Paris, France.
| |
Collapse
|
14
|
Shen CY, Xue W, Pang C, Alireza A, Mao X, Han J, Chen H, Fu C. Characterization of the complete mitochondrial genome of Quasilineus sinicus Gibson, 1990 (Nemertea: Heteronemertea) and its phylogenetic implications. Mitochondrial DNA B Resour 2022; 7:1749-1751. [PMID: 36213866 PMCID: PMC9542323 DOI: 10.1080/23802359.2022.2126287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
In this study, we sequenced and characterized the complete mitochondrial genome (mitogenome) of Quasilineus sinicus Gibson, 1990 (Heteronemertea, Nemertea) using Illumina sequencing technology. The circular mitogenome was 16,358 bp in length and comprised 22 transfer RNA genes, 13 protein-coding genes, and two ribosomal RNA genes. Its overall base composition included 20.82% A, 41.06% T, 26.68% G, and 11.44% C; in fact, the mitogenome had a high A + T content of 61.88%. Furthermore, our phylogenetic analysis demonstrated that Paleonemertea, Pilidiophora, and Hoplonemertea were monophyletic groups, and Q. sinicus was most closely related to Iwatanemertes piperata.
Collapse
Affiliation(s)
- Chun-Yang Shen
- Department of Biology, Chengde Medical University, Chengde, Hebei Province, China
| | - Wei Xue
- Department of Chemical Engineering, Hebei Petroleum University of Technology, Chengde, Hebei Province, China
| | - Chong Pang
- Department of Pharmacology, Chengde Medical University, Chengde, Hebei Province, China
| | - Asem Alireza
- Hainan Key Laboratory for Conservation and Utilization of Tropical Marine Fishery Resources, Hainan Tropical Ocean University, Sanya, Hainan Province, China
| | - Xiaonan Mao
- Department of Biology, Chengde Medical University, Chengde, Hebei Province, China
| | - Jiahui Han
- Department of Biology, Chengde Medical University, Chengde, Hebei Province, China
| | - Haonan Chen
- Department of Biology, Chengde Medical University, Chengde, Hebei Province, China
| | - Chunzheng Fu
- Institute of Sericulture, Chengde Medical University, Chengde, Hebei Province, China
| |
Collapse
|
15
|
Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment. Sci Rep 2022; 12:13124. [PMID: 35907931 PMCID: PMC9338934 DOI: 10.1038/s41598-022-17267-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/22/2022] [Indexed: 11/10/2022] Open
Abstract
Bioinformatic methods for detecting short tandem repeat expansions in short-read sequencing have identified new repeat expansions in humans, but require alignment information to identify repetitive motif enrichment at genomic locations. We present superSTR, an ultrafast method that does not require alignment. superSTR is used to process whole-genome and whole-exome sequencing data, and perform the first STR analysis of the UK Biobank, efficiently screening and identifying known and potential disease-associated STRs in the exomes of 49,953 biobank participants. We demonstrate the first bioinformatic screening of RNA sequencing data to detect repeat expansions in humans and mouse models of ataxia and dystrophy.
Collapse
|
16
|
Athanasouli M, Rödelsperger C. Analysis of repeat elements in the Pristionchus pacificus genome reveals an ancient invasion by horizontally transferred transposons. BMC Genomics 2022; 23:523. [PMID: 35854227 PMCID: PMC9297572 DOI: 10.1186/s12864-022-08731-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 07/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Repetitive sequences and mobile elements make up considerable fractions of individual genomes. While transposition events can be detrimental for organismal fitness, repetitive sequences form an enormous reservoir for molecular innovation. In this study, we aim to add repetitive elements to the annotation of the Pristionchus pacificus genome and assess their impact on novel gene formation. RESULTS Different computational approaches define up to 24% of the P. pacificus genome as repetitive sequences. While retroelements are more frequently found at the chromosome arms, DNA transposons are distributed more evenly. We found multiple DNA transposons, as well as LTR and LINE elements with abundant evidence of expression as single-exon transcripts. When testing whether transposons disproportionately contribute towards new gene formation, we found that roughly 10-20% of genes across all age classes overlap transposable elements with the strongest trend being an enrichment of low complexity regions among the oldest genes. Finally, we characterized a horizontal gene transfer of Zisupton elements into diplogastrid nematodes. These DNA transposons invaded nematodes from eukaryotic donor species and experienced a recent burst of activity in the P. pacificus lineage. CONCLUSIONS The comprehensive annotation of repetitive elements in the P. pacificus genome builds a resource for future functional genomic analyses as well as for more detailed investigations of molecular innovations.
Collapse
Affiliation(s)
- Marina Athanasouli
- Max Planck Institute for Biology, Department for Integrative Evolutionary Biology, Max-Planck-Ring 9, 72076, Tübingen, Germany
| | - Christian Rödelsperger
- Max Planck Institute for Biology, Department for Integrative Evolutionary Biology, Max-Planck-Ring 9, 72076, Tübingen, Germany.
| |
Collapse
|
17
|
Walker K, Kalra D, Lowdon R, Chen G, Molik D, Soto DC, Dabbaghie F, Khleifat AA, Mahmoud M, Paulin LF, Raza MS, Pfeifer SP, Agustinho DP, Aliyev E, Avdeyev P, Barrozo ER, Behera S, Billingsley K, Chong LC, Choubey D, De Coster W, Fu Y, Gener AR, Hefferon T, Henke DM, Höps W, Illarionova A, Jochum MD, Jose M, Kesharwani RK, Kolora SRR, Kubica J, Lakra P, Lattimer D, Liew CS, Lo BW, Lo C, Lötter A, Majidian S, Mendem SK, Mondal R, Ohmiya H, Parvin N, Peralta C, Poon CL, Prabhakaran R, Saitou M, Sammi A, Sanio P, Sapoval N, Syed N, Treangen T, Wang G, Xu T, Yang J, Zhang S, Zhou W, Sedlazeck FJ, Busby B. The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms. F1000Res 2022; 11:530. [PMID: 36262335 PMCID: PMC9557141 DOI: 10.12688/f1000research.110194.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2022] [Indexed: 01/25/2023] Open
Abstract
In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.
Collapse
Affiliation(s)
- Kimberly Walker
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | | | - Guangyi Chen
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany,Center for Bioinformatics, Saarland University, Saarbrücken, Germany,
| | - David Molik
- Tropical Crop and Commodity Protection Research Unit, Pacific Basin Agricultural Research Center, Hilo, HI, 96720, USA
| | - Daniela C. Soto
- Biochemistry & Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, Davis, CA, 95616, USA
| | - Fawaz Dabbaghie
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany,Institute for Medical Biometry and Bioinformatics, University hospital Düsseldorf, Düsseldorf, Germany
| | - Ahmad Al Khleifat
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Muhammad Sohail Raza
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Beijing, China
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Daniel Paiva Agustinho
- Department of Molecular Microbiology, Washington University in St. Louis School of Medicine, St. Louis, MO, 63110, USA
| | - Elbay Aliyev
- Research Department, Sidra Medicine, Doha, Qatar
| | - Pavel Avdeyev
- Computational Biology Institute, The George Washington University, Washington, DC, 20052, USA
| | - Enrico R. Barrozo
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kimberley Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Li Chuin Chong
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, Turkey
| | - Deepak Choubey
- Department of Technology, Savitribai Phule Pune University, Pune, Maharashtra, India
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, Antwerp, Belgium,Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Alejandro R. Gener
- Association of Public Health Labs, Centers for Disease Control and Prevention, Downey, CA, USA
| | - Timothy Hefferon
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Morgan Henke
- Department Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Wolfram Höps
- EMBL Heidelberg, Genome Biology Unit, Heidelberg, Germany
| | | | - Michael D. Jochum
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Maria Jose
- Centre for Bioinformatics, Pondicherry University, Pondicherry, India
| | - Rupesh K. Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | | | - Priya Lakra
- Department of Zoology, University of Delhi, Delhi, India
| | - Damaris Lattimer
- University of Applied Sciences Upper Austria - FH Hagenberg, Mühlkreis, Austria
| | - Chia-Sin Liew
- Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, Nebraska, 68588, USA
| | - Bai-Wei Lo
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Chunhsuan Lo
- Human Genetics Laboratory, National Institute of Genetics, Japan, Mishima City, Japan
| | - Anneri Lötter
- Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | | | - Rajarshi Mondal
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | - Hiroko Ohmiya
- Genetic Reagent Development Unit, Medical & Biological Laboratories Co., Ltd., Tokoyo, Japan
| | - Nasrin Parvin
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | | | | | | | - Marie Saitou
- Center of Integrative Genetics (CIGENE),Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Aditi Sammi
- School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh, India
| | - Philippe Sanio
- University of Applied Sciences Upper Austria - FH Hagenberg, Hagenberg im Mühlkreis, Austria
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Najeeb Syed
- Research Department, Sidra Medicine, Doha, Qatar
| | - Todd Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Tiancheng Xu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Jianzhi Yang
- Department of Quantitative and Computational Biology,, University of Southern California, Los Angeles, CA, USA
| | - Shangzhe Zhang
- School of Biology, University of St Andrews, St Andrews, UK
| | - Weiyu Zhou
- Department of Statistical Science, George Mason University, Fairfax, Virginia, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | | |
Collapse
|
18
|
Korotkov E, Zaytsev K, Fedorov A. Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms. ENTROPY (BASEL, SWITZERLAND) 2022; 24:632. [PMID: 35626518 PMCID: PMC9141341 DOI: 10.3390/e24050632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 12/02/2022]
Abstract
In this paper, we attempted to find a relation between bacteria living conditions and their genome algorithmic complexity. We developed a probabilistic mathematical method for the evaluation of k-words (6 bases length) occurrence irregularity in bacterial gene coding sequences. For this, the coding sequences from different bacterial genomes were analyzed and as an index of k-words occurrence irregularity, we used W, which has a distribution similar to normal. The research results for bacterial genomes show that they can be divided into two uneven groups. First, the smaller one has W in the interval from 170 to 475, while for the second it is from 475 to 875. Plants, metazoan and virus genomes also have W in the same interval as the first bacterial group. We suggested that second bacterial group coding sequences are much less susceptible to evolutionary changes than the first group ones. It is also discussed to use the W index as a biological stress value.
Collapse
Affiliation(s)
- Eugene Korotkov
- Institute of Bioengineering, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| | - Konstantin Zaytsev
- Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia; (K.Z.); (A.F.)
| | - Alexey Fedorov
- Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia; (K.Z.); (A.F.)
| |
Collapse
|
19
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
20
|
Inouye S. Multiple Cypridina Luciferase Genes in the Genome of Individual Ostracods, Vargula hilgendorfii (Cypridina hilgendorfii). Photochem Photobiol 2021; 98:1293-1302. [PMID: 34181758 DOI: 10.1111/php.13479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 06/24/2021] [Indexed: 11/28/2022]
Abstract
The genomic structure of the Cypridina luciferase gene in Vargula hilgendorfii (formerly Cypridina hilgendorfii) was determined with three λ phage clones (λ34, λ45, and λ61). The luciferase genes in clones λ34 and λ61 consisted of 13 exons and 12 introns, and clone λ45 only contained exons 1-5. The splicing sites of the luciferase genes in λ34 and λ61 were conserved completely with the consensus sequence. The translated luciferases had 555 amino acid residues, which were over 98.6% identical to those of cDNA clones as previously reported. In contrast, each intron in clones λ34, λ45, and λ61 varied significantly in length. To explain the variation of intron length among the three V. hilgendorfii luciferase genes, genomic DNA was isolated from a single V. hilgendorfii specimen and the regions from exon 1-3 of the luciferase gene were amplified by polymerase chain reaction (PCR). PCR products with various lengths were detected and were confirmed as the luciferase gene fragments by Southern blot analysis. Furthermore, DNA sequence analysis indicated that at least seven luciferase gene groups might be present in the genome of a single specimen. Thus, multiple Cypridina luciferase genes exist in the genome of a single V. hilgendorfii specimen.
Collapse
Affiliation(s)
- Satoshi Inouye
- Yokohama Research Center, JNC Co, 5-1 Okawa, Kanazawa-ku, Yokohama, 236-8605, Japan
| |
Collapse
|
21
|
Catara V, Cubero J, Pothier JF, Bosis E, Bragard C, Đermić E, Holeva MC, Jacques MA, Petter F, Pruvost O, Robène I, Studholme DJ, Tavares F, Vicente JG, Koebnik R, Costa J. Trends in Molecular Diagnosis and Diversity Studies for Phytosanitary Regulated Xanthomonas. Microorganisms 2021; 9:862. [PMID: 33923763 PMCID: PMC8073235 DOI: 10.3390/microorganisms9040862] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/10/2021] [Accepted: 04/12/2021] [Indexed: 11/17/2022] Open
Abstract
Bacteria in the genus Xanthomonas infect a wide range of crops and wild plants, with most species responsible for plant diseases that have a global economic and environmental impact on the seed, plant, and food trade. Infections by Xanthomonas spp. cause a wide variety of non-specific symptoms, making their identification difficult. The coexistence of phylogenetically close strains, but drastically different in their phenotype, poses an added challenge to diagnosis. Data on future climate change scenarios predict an increase in the severity of epidemics and a geographical expansion of pathogens, increasing pressure on plant health services. In this context, the effectiveness of integrated disease management strategies strongly depends on the availability of rapid, sensitive, and specific diagnostic methods. The accumulation of genomic information in recent years has facilitated the identification of new DNA markers, a cornerstone for the development of more sensitive and specific methods. Nevertheless, the challenges that the taxonomic complexity of this genus represents in terms of diagnosis together with the fact that within the same bacterial species, groups of strains may interact with distinct host species demonstrate that there is still a long way to go. In this review, we describe and discuss the current molecular-based methods for the diagnosis and detection of regulated Xanthomonas, taxonomic and diversity studies in Xanthomonas and genomic approaches for molecular diagnosis.
Collapse
Affiliation(s)
- Vittoria Catara
- Department of Agriculture, Food and Environment, University of Catania, 95125 Catania, Italy
| | - Jaime Cubero
- National Institute for Agricultural and Food Research and Technology (INIA), 28002 Madrid, Spain;
| | - Joël F. Pothier
- Environmental Genomics and Systems Biology Research Group, Institute for Natural Resource Sciences, Zurich University of Applied Sciences (ZHAW), 8820 Wädenswil, Switzerland;
| | - Eran Bosis
- Department of Biotechnology Engineering, ORT Braude College of Engineering, Karmiel 2161002, Israel;
| | - Claude Bragard
- UCLouvain, Earth & Life Institute, Applied Microbiology, 1348 Louvain-la-Neuve, Belgium;
| | - Edyta Đermić
- Department of Plant Pathology, Faculty of Agriculture, University of Zagreb, 10000 Zagreb, Croatia;
| | - Maria C. Holeva
- Benaki Phytopathological Institute, Scientific Directorate of Phytopathology, Laboratory of Bacteriology, GR-14561 Kifissia, Greece;
| | - Marie-Agnès Jacques
- IRHS, INRA, AGROCAMPUS-Ouest, Univ Angers, SFR 4207 QUASAV, 49071 Beaucouzé, France;
| | - Francoise Petter
- European and Mediterranean Plant Protection Organization (EPPO/OEPP), 75011 Paris, France;
| | - Olivier Pruvost
- CIRAD, UMR PVBMT, F-97410 Saint Pierre, La Réunion, France; (O.P.); (I.R.)
| | - Isabelle Robène
- CIRAD, UMR PVBMT, F-97410 Saint Pierre, La Réunion, France; (O.P.); (I.R.)
| | | | - Fernando Tavares
- CIBIO—Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO-Laboratório Associado, Universidade do Porto, 4485-661 Vairão, Portugal; or
- FCUP-Faculdade de Ciências, Departamento de Biologia, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | | | - Ralf Koebnik
- Plant Health Institute of Montpellier (PHIM), Univ Montpellier, Cirad, INRAe, Institut Agro, IRD, 34398 Montpellier, France;
| | - Joana Costa
- Centre for Functional Ecology-Science for People & the Planet, Department of Life Sciences, University of Coimbra, 300-456 Coimbra, Portugal
- Laboratory for Phytopathology, Instituto Pedro Nunes, 3030-199 Coimbra, Portugal
| |
Collapse
|
22
|
Gerasimov ES, Gasparyan AA, Afonin DA, Zimmer SL, Kraeva N, Lukeš J, Yurchenko V, Kolesnikov A. Complete minicircle genome of Leptomonas pyrrhocoris reveals sources of its non-canonical mitochondrial RNA editing events. Nucleic Acids Res 2021; 49:3354-3370. [PMID: 33660779 PMCID: PMC8034629 DOI: 10.1093/nar/gkab114] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 02/03/2021] [Accepted: 02/09/2021] [Indexed: 01/24/2023] Open
Abstract
Uridine insertion/deletion (U-indel) editing of mitochondrial mRNA, unique to the protistan class Kinetoplastea, generates canonical as well as potentially non-productive editing events. While the molecular machinery and the role of the guide (g) RNAs that provide required information for U-indel editing are well understood, little is known about the forces underlying its apparently error-prone nature. Analysis of a gRNA:mRNA pair allows the dissection of editing events in a given position of a given mitochondrial transcript. A complete gRNA dataset, paired with a fully characterized mRNA population that includes non-canonically edited transcripts, would allow such an analysis to be performed globally across the mitochondrial transcriptome. To achieve this, we have assembled 67 minicircles of the insect parasite Leptomonas pyrrhocoris, with each minicircle typically encoding one gRNA located in one of two similar-sized units of different origin. From this relatively narrow set of annotated gRNAs, we have dissected all identified mitochondrial editing events in L. pyrrhocoris, the strains of which dramatically differ in the abundance of individual minicircle classes. Our results support a model in which a multitude of editing events are driven by a limited set of gRNAs, with individual gRNAs possessing an inherent ability to guide canonical and non-canonical editing.
Collapse
Affiliation(s)
- Evgeny S Gerasimov
- Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow 119991, Russia
- Martsinovsky Institute of Medical Parasitology, Tropical and Vector Borne Diseases, Sechenov University, Moscow 119435, Russia
- Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127051, Russia
| | - Anna A Gasparyan
- Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow 119991, Russia
| | - Dmitry A Afonin
- Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow 119991, Russia
| | - Sara L Zimmer
- Department of Biomedical Sciences, University of Minnesota Medical School, Duluth Campus, Duluth, MN 55812, USA
| | - Natalya Kraeva
- Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic
| | - Julius Lukeš
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, 370 05 České Budějovice (Budweis), Czech Republic
- Faculty of Science, University of South Bohemia, 370 05 České Budějovice (Budweis), Czech Republic
| | - Vyacheslav Yurchenko
- Martsinovsky Institute of Medical Parasitology, Tropical and Vector Borne Diseases, Sechenov University, Moscow 119435, Russia
- Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic
| | - Alexander Kolesnikov
- Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow 119991, Russia
| |
Collapse
|
23
|
Korotkov EV, Kamionskya AM, Korotkova MA. Detection of Highly Divergent Tandem Repeats in the Rice Genome. Genes (Basel) 2021; 12:genes12040473. [PMID: 33806152 PMCID: PMC8064497 DOI: 10.3390/genes12040473] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 03/11/2021] [Accepted: 03/23/2021] [Indexed: 11/25/2022] Open
Abstract
Currently, there is a lack of bioinformatics approaches to identify highly divergent tandem repeats (TRs) in eukaryotic genomes. Here, we developed a new mathematical method to search for TRs, which uses a novel algorithm for constructing multiple alignments based on the generation of random position weight matrices (RPWMs), and applied it to detect TRs of 2 to 50 nucleotides long in the rice genome. The RPWM method could find highly divergent TRs in the presence of insertions or deletions. Comparison of the RPWM algorithm with the other methods of TR identification showed that RPWM could detect TRs in which the average number of base substitutions per nucleotide (x) was between 1.5 and 3.2, whereas T-REKS and TRF methods could not detect divergent TRs with x > 1.5. Applied to the search of TRs in the rice genome, the RPWM method revealed that TRs occupied 5% of the genome and that most of them were 2 and 3 bases long. Using RPWM, we also revealed the correlation of TRs with dispersed repeats and transposons, suggesting that some transposons originated from TRs. Thus, the novel RPWM algorithm is an effective tool to search for highly divergent TRs in the genomes.
Collapse
Affiliation(s)
- Eugene V Korotkov
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Bld.2, 33 Leninsky Ave., 119071 Moscow, Russia
- MEPhI (Moscow Engineering Physics Institute), National Research Nuclear University, 31 Kashirskoye Shosse, 115409 Moscow, Russia
| | - Anastasiya M Kamionskya
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Bld.2, 33 Leninsky Ave., 119071 Moscow, Russia
| | - Maria A Korotkova
- MEPhI (Moscow Engineering Physics Institute), National Research Nuclear University, 31 Kashirskoye Shosse, 115409 Moscow, Russia
| |
Collapse
|
24
|
Touati R, Tajouri A, Mesaoudi I, Oueslati AE, Lachiri Z, Kharrat M. New methodology for repetitive sequences identification in human X and Y chromosomes. Biomed Signal Process Control 2021; 64:102207. [PMID: 33101452 PMCID: PMC7572123 DOI: 10.1016/j.bspc.2020.102207] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 07/23/2020] [Accepted: 09/01/2020] [Indexed: 11/24/2022]
Abstract
Repetitive DNA sequences occupy the major proportion of DNA in the human genome and even in the other species' genomes. The importance of each repetitive DNA type depends on many factors: structural and functional roles, positions, lengths and numbers of these repetitions are clear examples. Conserving such DNA sequences or not in different locations in the chromosome remains a challenge for researchers in biology. Detecting their location despite their great variability and finding novel repetitive sequences remains a challenging task. To side-step this problem, we developed a new method based on signal and image processing tools. In fact, using this method we could find repetitive patterns in DNA images regardless of the repetition length. This new technique seems to be more efficient in detecting new repetitive sequences than bioinformatics tools. In fact, the classical tools present limited performances especially in case of mutations (insertion or deletion). However, modifying one or a few numbers of pixels in the image doesn't affect the global form of the repetitive pattern. As a consequence, we generated a new repetitive patterns database which contains tandem and dispersed repeated sequences. The highly repetitive sequences, we have identified in X and Y chromosomes, are shown to be located in other human chromosomes or in other genomes. The data we have generated is then taken as input to a Convolutional neural network classifier in order to classify them. The system we have constructed is efficient and gives an average of 94.4% as recognition score.
Collapse
Affiliation(s)
- Rabeb Touati
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Asma Tajouri
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
| | - Imen Mesaoudi
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Afef Elloumi Oueslati
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Zied Lachiri
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Maher Kharrat
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
| |
Collapse
|
25
|
Suleman, Muhammad N, Khan MS, Tkach VV, Ullah H, Ehsan M, Ma J, Zhu XQ. Mitochondrial genomes of two eucotylids as the first representatives from the superfamily Microphalloidea (Trematoda) and phylogenetic implications. Parasit Vectors 2021; 14:48. [PMID: 33446249 PMCID: PMC7807500 DOI: 10.1186/s13071-020-04547-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 12/13/2020] [Indexed: 11/10/2022] Open
Abstract
Background The Eucotylidae Cohn, 1904 (Superfamily: Microphalloidea), is a family of digeneans parasitic in kidneys of birds as adults. The group is characterized by the high level of morphological similarities among genera and unclear systematic value of morphological characters traditionally used for their differentiation. In the present study, we sequenced the complete or nearly complete mitogenomes (mt genome) of two eucotylids representing the genera Tamerlania (T. zarudnyi) and Tanaisia (Tanaisia sp.). They represent the first sequenced mt genomes of any member of the superfamily Microphalloidea. Methods A comparative mitogenomic analysis of the two newly sequenced eucotylids was conducted for the investigation of mitochondrial gene arrangement, contents and genetic distance. Phylogenetic position of the family Eucotylidae within the order Plagiorchiida was examined using nucleotide sequences of mitochondrial protein-coding genes (PCGs) plus RNAs using maximum likelihood (ML) and Bayesian inference (BI) methods. BI phylogeny based on concatenated amino acids sequences of PCGs was also conducted to determine possible effects of silent mutations. Results The complete mt genome of T. zarudnyi was 16,188 bp and the nearly complete mt genome of Tanaisia sp. was 13,953 bp in length. A long string of additional amino acids (about 123 aa) at the 5′ end of the cox1 gene in both studied eucotylid mt genomes has resulted in the cox1 gene of eucotylids being longer than in all previously sequenced digeneans. The rrnL gene was also longer than previously reported in any digenean mitogenome sequenced so far. The TΨC and DHU loops of the tRNAs varied greatly between the two eucotylids while the anticodon loop was highly conserved. Phylogenetic analyses based on mtDNA nucleotide and amino acids sequences (as a separate set) positioned eucotylids as a sister group to all remaining members of the order Plagiorchiida. Both ML and BI phylogenies revealed the paraphyletic nature of the superfamily Gorgoderoidea and the suborder Xiphidiata. Conclusions The average sequence identity, combined nucleotide diversity and Kimura-2 parameter distances between the two eucotylid mitogenomes demonstrated that atp6, nad5, nad4L and nad6 genes are better markers than the traditionally used cox1 or nad1 for the species differentiation and population-level studies of eucotylids because of their higher variability. The position of the Dicrocoeliidae and Eucotylidae outside the clade uniting other xiphidiatan trematodes strengthened the argument for the need for re-evaluation of the taxonomic content of the Xiphidiata.![]()
Collapse
Affiliation(s)
- Suleman
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, 730046, People's Republic of China.,Department of Zoology, University of Swabi, Swabi, Khyber Pakhtunkhwa, Pakistan
| | - Nehaz Muhammad
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, 730046, People's Republic of China
| | - Mian Sayed Khan
- Department of Zoology, University of Swabi, Swabi, Khyber Pakhtunkhwa, Pakistan
| | - Vasyl V Tkach
- Department of Biology, University of North Dakota, Grand Forks, ND, 58202-9019, USA.
| | - Hanif Ullah
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Key Laboratory of Animal Parasitology, Shanghai, 20041, People's Republic of China
| | - Muhammad Ehsan
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, 730046, People's Republic of China
| | - Jun Ma
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, 730046, People's Republic of China.
| | - Xing-Quan Zhu
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, 730046, People's Republic of China. .,College of Veterinary Medicine, Shanxi Agricultural University, Taigu, 030801, Shanxi, People's Republic of China.
| |
Collapse
|
26
|
Tanifuji G, Kamikawa R, Moore CE, Mills T, Onodera NT, Kashiyama Y, Archibald JM, Inagaki Y, Hashimoto T. Comparative Plastid Genomics of Cryptomonas Species Reveals Fine-Scale Genomic Responses to Loss of Photosynthesis. Genome Biol Evol 2020; 12:3926-3937. [PMID: 31922581 PMCID: PMC7058160 DOI: 10.1093/gbe/evaa001] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/04/2020] [Indexed: 01/20/2023] Open
Abstract
Loss of photosynthesis is a recurring theme in eukaryotic evolution. In organisms that have lost the ability to photosynthesize, nonphotosynthetic plastids are retained because they play essential roles in processes other than photosynthesis. The unicellular algal genus Cryptomonas contains both photosynthetic and nonphotosynthetic members, the latter having lost the ability to photosynthesize on at least three separate occasions. To elucidate the evolutionary processes underlying the loss of photosynthesis, we sequenced the plastid genomes of two nonphotosynthetic strains, Cryptomonas sp. CCAC1634B and SAG977-2f, as well as the genome of the phototroph Cryptomonas curvata CCAP979/52. These three genome sequences were compared with the previously sequenced plastid genome of the nonphotosynthetic species Cryptomonas paramecium CCAP977/2a as well as photosynthetic members of the Cryptomonadales, including C. curvata FBCC300012D. Intraspecies comparison between the two C. curvata strains showed that although their genome structures are stable, the substitution rates of their genes are relatively high. Although most photosynthesis-related genes, such as the psa and psb gene families, were found to have disappeared from the nonphotosynthetic strains, at least ten pseudogenes are retained in SAG977-2f. Although gene order is roughly shared among the plastid genomes of photosynthetic Cryptomonadales, genome rearrangements are seen more frequently in the smaller genomes of the nonphotosynthetic strains. Intriguingly, the light-independent protochlorophyllide reductase comprising chlB, L, and N is retained in nonphotosynthetic SAG977-2f and CCAC1634B. On the other hand, whereas CCAP977/2a retains ribulose-1,5-bisphosphate carboxylase/oxygenase-related genes, including rbcL, rbcS, and cbbX, the plastid genomes of the other two nonphotosynthetic strains have lost the ribulose-1,5-bisphosphate carboxylase/oxygenase protein-coding genes.
Collapse
Affiliation(s)
- Goro Tanifuji
- Department of Zoology, National Museum of Nature and Science, Ibaraki, Japan
| | - Ryoma Kamikawa
- Graduate School of Human and Environmental Studies, Kyoto University, Kyoto, Japan
| | - Christa E Moore
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Tyler Mills
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Naoko T Onodera
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Yuichiro Kashiyama
- Department of Applied Chemistry and Food Science, Fukui University of Technology, Fukui, Japan
| | - John M Archibald
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Yuji Inagaki
- Center for Computational Sciences, University of Tsukuba, Ibaraki, Japan.,Graduate School of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan
| | - Tetsuo Hashimoto
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan
| |
Collapse
|
27
|
Genome assembly and annotation of Meloidogyne enterolobii, an emerging parthenogenetic root-knot nematode. Sci Data 2020; 7:324. [PMID: 33020495 PMCID: PMC7536185 DOI: 10.1038/s41597-020-00666-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 08/27/2020] [Indexed: 11/21/2022] Open
Abstract
Root-knot nematodes (genus Meloidogyne) are plant parasites causing huge economic loss in the agricultural industry and affecting severely numerous developing countries. Control methods against these plant pests are sparse, the preferred one being the deployment of plant cultivars bearing resistance genes against Meloidogyne species. However, M. enterolobii is not controlled by the resistance genes deployed in the crop plants cultivated in Europe. The recent identification of this species in Europe is thus a major concern. Here, we sequenced the genome of M. enterolobii using short and long-read technologies. The genome assembly spans 240 Mbp with contig N50 size of 143 kbp, enabling high-quality annotations of 59,773 coding genes, 4,068 non-coding genes, and 10,944 transposable elements (spanning 8.7% of the genome). We validated the genome size by flow cytometry and the structure, quality and completeness by bioinformatics metrics. This ensemble of resources will fuel future projects aiming at pinpointing the genome singularities, the origin, diversity, and adaptive potential of this emerging plant pest. Measurement(s) | genome • sequence_assembly • sequence feature annotation | Technology Type(s) | DNA sequencing assay • sequence assembly process • sequence annotation | Sample Characteristic - Organism | Meloidogyne enterolobii |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12410363
Collapse
|
28
|
Dennis AB, Ballesteros GI, Robin S, Schrader L, Bast J, Berghöfer J, Beukeboom LW, Belghazi M, Bretaudeau A, Buellesbach J, Cash E, Colinet D, Dumas Z, Errbii M, Falabella P, Gatti JL, Geuverink E, Gibson JD, Hertaeg C, Hartmann S, Jacquin-Joly E, Lammers M, Lavandero BI, Lindenbaum I, Massardier-Galata L, Meslin C, Montagné N, Pak N, Poirié M, Salvia R, Smith CR, Tagu D, Tares S, Vogel H, Schwander T, Simon JC, Figueroa CC, Vorburger C, Legeai F, Gadau J. Functional insights from the GC-poor genomes of two aphid parasitoids, Aphidius ervi and Lysiphlebus fabarum. BMC Genomics 2020; 21:376. [PMID: 32471448 PMCID: PMC7257214 DOI: 10.1186/s12864-020-6764-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 04/30/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Parasitoid wasps have fascinating life cycles and play an important role in trophic networks, yet little is known about their genome content and function. Parasitoids that infect aphids are an important group with the potential for biological control. Their success depends on adapting to develop inside aphids and overcoming both host aphid defenses and their protective endosymbionts. RESULTS We present the de novo genome assemblies, detailed annotation, and comparative analysis of two closely related parasitoid wasps that target pest aphids: Aphidius ervi and Lysiphlebus fabarum (Hymenoptera: Braconidae: Aphidiinae). The genomes are small (139 and 141 Mbp) and the most AT-rich reported thus far for any arthropod (GC content: 25.8 and 23.8%). This nucleotide bias is accompanied by skewed codon usage and is stronger in genes with adult-biased expression. AT-richness may be the consequence of reduced genome size, a near absence of DNA methylation, and energy efficiency. We identify missing desaturase genes, whose absence may underlie mimicry in the cuticular hydrocarbon profile of L. fabarum. We highlight key gene groups including those underlying venom composition, chemosensory perception, and sex determination, as well as potential losses in immune pathway genes. CONCLUSIONS These findings are of fundamental interest for insect evolution and biological control applications. They provide a strong foundation for further functional studies into coevolution between parasitoids and their hosts. Both genomes are available at https://bipaa.genouest.org.
Collapse
Affiliation(s)
- Alice B Dennis
- Department of Aquatic Ecology, Eawag, 8600, Dübendorf, Switzerland.
- Institute of Integrative Biology, ETH Zürich, 8092, Zürich, Switzerland.
- Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany.
| | - Gabriel I Ballesteros
- Instituto de Ciencias Biológicas, Universidad de Talca, Talca, Chile
- Centre for Molecular and Functional Ecology in Agroecosystems, Universidad de Talca, Talca, Chile
- Laboratorio de Control Biológico, Instituto de Ciencias Biológicas, Universidad de Talca, Talca, Chile
| | - Stéphanie Robin
- IGEPP, Agrocampus Ouest, INRAE, Université de Rennes, 35650, Le Rheu, France
- Université de Rennes 1, INRIA, CNRS, IRISA, 35000, Rennes, France
| | - Lukas Schrader
- Institute for Evolution and Biodiversity, Universität Münster, Münster, Germany
| | - Jens Bast
- Department of Ecology and Evolution, Université de Lausanne, 1015, Lausanne, Switzerland
- Institute of Zoology, Universität zu Köln, 50674, Köln, Germany
| | - Jan Berghöfer
- Institute for Evolution and Biodiversity, Universität Münster, Münster, Germany
| | - Leo W Beukeboom
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands
| | - Maya Belghazi
- Aix-Marseille Univ, CNRS, INP, Inst Neurophysiopathol, PINT, PFNT, Marseille, France
| | - Anthony Bretaudeau
- IGEPP, Agrocampus Ouest, INRAE, Université de Rennes, 35650, Le Rheu, France
- Université de Rennes 1, INRIA, CNRS, IRISA, 35000, Rennes, France
| | - Jan Buellesbach
- Institute for Evolution and Biodiversity, Universität Münster, Münster, Germany
| | - Elizabeth Cash
- Department of Environmental Science, Policy, & Management, University of California, Berkeley, Berkeley, CA, 94720, USA
| | | | - Zoé Dumas
- Department of Ecology and Evolution, Université de Lausanne, 1015, Lausanne, Switzerland
| | - Mohammed Errbii
- Institute for Evolution and Biodiversity, Universität Münster, Münster, Germany
| | | | - Jean-Luc Gatti
- Université Côte d'Azur, INRAE, CNRS, ISA, Sophia Antipolis, France
| | - Elzemiek Geuverink
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands
| | - Joshua D Gibson
- Department of Environmental Science, Policy, & Management, University of California, Berkeley, Berkeley, CA, 94720, USA
- Department of Biology, Georgia Southern University, Statesboro, GA, 30460, USA
| | - Corinne Hertaeg
- Department of Aquatic Ecology, Eawag, 8600, Dübendorf, Switzerland
- Department of Environmental Systems Sciences, D-USYS, ETH Zürich, Zürich, Switzerland
| | - Stefanie Hartmann
- Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany
| | - Emmanuelle Jacquin-Joly
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université Paris Diderot, Institute of Ecology and Environmental Sciences of Paris, iEES-Paris, F-78000, Versailles, France
| | - Mark Lammers
- Institute for Evolution and Biodiversity, Universität Münster, Münster, Germany
| | - Blas I Lavandero
- Laboratorio de Control Biológico, Instituto de Ciencias Biológicas, Universidad de Talca, Talca, Chile
| | - Ina Lindenbaum
- Institute for Evolution and Biodiversity, Universität Münster, Münster, Germany
| | | | - Camille Meslin
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université Paris Diderot, Institute of Ecology and Environmental Sciences of Paris, iEES-Paris, F-78000, Versailles, France
| | - Nicolas Montagné
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université Paris Diderot, Institute of Ecology and Environmental Sciences of Paris, iEES-Paris, F-78000, Versailles, France
| | - Nina Pak
- Department of Environmental Science, Policy, & Management, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Marylène Poirié
- Université Côte d'Azur, INRAE, CNRS, ISA, Sophia Antipolis, France
| | - Rosanna Salvia
- Department of Sciences, University of Basilicata, 85100, Potenza, Italy
| | - Chris R Smith
- Department of Biology, Earlham College, Richmond, IN, 47374, USA
| | - Denis Tagu
- IGEPP, Agrocampus Ouest, INRAE, Université de Rennes, 35650, Le Rheu, France
| | - Sophie Tares
- Université Côte d'Azur, INRAE, CNRS, ISA, Sophia Antipolis, France
| | - Heiko Vogel
- Department of Entomology, Max Planck Institute for Chemical Ecology, Jena, Germany
| | - Tanja Schwander
- Department of Ecology and Evolution, Université de Lausanne, 1015, Lausanne, Switzerland
| | | | - Christian C Figueroa
- Instituto de Ciencias Biológicas, Universidad de Talca, Talca, Chile
- Centre for Molecular and Functional Ecology in Agroecosystems, Universidad de Talca, Talca, Chile
| | - Christoph Vorburger
- Department of Aquatic Ecology, Eawag, 8600, Dübendorf, Switzerland
- Institute of Integrative Biology, ETH Zürich, 8092, Zürich, Switzerland
| | - Fabrice Legeai
- IGEPP, Agrocampus Ouest, INRAE, Université de Rennes, 35650, Le Rheu, France
- Université de Rennes 1, INRIA, CNRS, IRISA, 35000, Rennes, France
| | - Jürgen Gadau
- Institute for Evolution and Biodiversity, Universität Münster, Münster, Germany.
| |
Collapse
|
29
|
Developing an ultra-efficient microsatellite discoverer to find structural differences between SARS-CoV-1 and Covid-19. INFORMATICS IN MEDICINE UNLOCKED 2020; 19:100356. [PMID: 32501423 PMCID: PMC7241407 DOI: 10.1016/j.imu.2020.100356] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 05/20/2020] [Accepted: 05/20/2020] [Indexed: 12/24/2022] Open
Abstract
Motivation Recently, the outbreak of Coronavirus-Covid-19 has forced the World Health Organization to declare a pandemic status. A genome sequence is the core of this virus which interferes with the normal activities of its counterparts within humans. Analysis of its genome may provide clues toward the proper treatment of patients and the design of new drugs and vaccines. Microsatellites are composed of short genome subsequences which are successively repeated many times in the same direction. They are highly variable in terms of their building blocks, number of repeats, and their locations in the genome sequences. This mutability property has been the source of many diseases. Usually the host genome is analyzed to diagnose possible diseases in the victim. In this research, the focus is concentrated on the attacker's genome for discovery of its malicious properties. Results The focus of this research is the microsatellites of both SARS and Covid-19. An accurate and highly efficient computer method for identifying all microsatellites in the genome sequences is discovered and implemented, and it is used to find all microsatellites in the Coronavirus-Covid-19 and SARS2003. The Microsatellite discovery is based on an efficient indexing technique called K-Mer Hash Indexing. The method is called Fast Microsatellite Discovery (FMSD) and it is used for both SARS and Covid-19. A table composed of all microsatellites is reported. There are many differences between SARS and Covid-19, but there is an outstanding difference which requires further investigation. Availability FMSD is freely available at https://gitlab.com/FUM_HPCLab/fmsd_project, implemented in C on Linux-Ubuntu system. Software related contact: hossein_savari@mail.um.ac.ir.
Collapse
|
30
|
Mokhtar MM, Atia MAM. SSRome: an integrated database and pipelines for exploring microsatellites in all organisms. Nucleic Acids Res 2020; 47:D244-D252. [PMID: 30365025 PMCID: PMC6323889 DOI: 10.1093/nar/gky998] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/14/2018] [Indexed: 11/23/2022] Open
Abstract
Over the past decade, many databases focusing on microsatellite mining on a genomic scale were released online with at least one of the following major deficiencies: (i) lacking the classification of microsatellites as genic or non-genic, (ii) not comparing microsatellite motifs at both genic and non-genic levels in order to identify unique motifs for each class or (iii) missing SSR marker development. In this study, we have developed ‘SSRome’ as a web-based, user-friendly, comprehensive and dynamic database with pipelines for exploring microsatellites in 6533 organisms. In the SSRome database, 158 million microsatellite motifs are identified across all taxa, in addition to all the mitochondrial and chloroplast genomes and expressed sequence tags available from NCBI. Moreover, 45.1 million microsatellite markers were developed and classified as genic or non-genic. All the stored motif and marker datasets can be downloaded freely. In addition, SSRome provides three user-friendly tools to identify, classify and compare motifs on either a genome- or transcriptome-wide scale. With the implementation of PHP, HTML and JavaScript, users can upload their data for analysis via a user-friendly GUI. SSRome represents a powerful database and mega-tool that will assist researchers in developing and dissecting microsatellite markers on a high-throughput scale.
Collapse
Affiliation(s)
- Morad M Mokhtar
- Molecular Genetics and Genome Mapping Laboratory, Genome Mapping Department, Agricultural Genetic Engineering Research Institute (AGERI), ARC, Giza, 12619, Egypt
| | - Mohamed A M Atia
- Molecular Genetics and Genome Mapping Laboratory, Genome Mapping Department, Agricultural Genetic Engineering Research Institute (AGERI), ARC, Giza, 12619, Egypt
| |
Collapse
|
31
|
Suleman, Khan MS, Tkach VV, Muhammad N, Zhang D, Zhu XQ, Ma J. Molecular phylogenetics and mitogenomics of three avian dicrocoeliids (Digenea: Dicrocoeliidae) and comparison with mammalian dicrocoeliids. Parasit Vectors 2020; 13:74. [PMID: 32054541 PMCID: PMC7020495 DOI: 10.1186/s13071-020-3940-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 02/03/2020] [Indexed: 02/05/2023] Open
Abstract
Background The Dicrocoeliidae are digenetic trematodes mostly parasitic in the bile ducts and gall bladder of various avian and mammalian hosts. Until recently their systematics was based on morphological data only. Due to the high morphological uniformity across multiple dicrocoeliid taxa and insufficient knowledge of relative systematic value of traditionally used morphological characters, their taxonomy has always been unstable. Therefore, DNA sequence data provide a critical independent source of characters for phylogenetic inference and improvement of the system. Methods We examined the phylogenetic affinities of three avian dicrocoeliids representing the genera Brachylecithum, Brachydistomum and Lyperosomum, using partial sequences of the nuclear large ribosomal subunit (28S) RNA gene. We also sequenced the complete or nearly complete mitogenomes of these three isolates and conducted a comparative mitogenomic analysis with the previously available mitogenomes from three mammalian dicrocoeliids (from 2 different genera) and examined the phylogenetic position of the family Dicrocoeliidae within the order Plagiorchiida based on concatenated nucleotide sequences of all mitochondrial genes (except trnG and trnE). Results Combined nucleotide diversity, Kimura-2-parameter distance, non-synonymous/synonymous substitutions ratio and average sequence identity analyses consistently demonstrated that cox1, cytb, nad1 and two rRNAs were the most conserved and atp6, nad5, nad3 and nad2 were the most variable genes across dicrocoeliid mitogenomes. Phylogenetic analyses based on mtDNA sequences did not support the close relatedness of the Paragonimidae and Dicrocoeliidae and suggested non-monophyly of the Gorgoderoidea as currently recognized. Conclusions Our results show that fast-evolving mitochondrial genes atp6, nad5 and nad3 would be better markers than slow-evolving genes cox1 and nad1 for species discrimination and population level studies in the Dicrocoeliidae. Furthermore, the Dicrocoeliidae being outside of the clade containing other xiphidiatan trematodes suggests a need for the re-evaluation of the taxonomic content of the Xiphidiata.
Collapse
Affiliation(s)
- Suleman
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046, Gansu, People's Republic of China.,Department of Zoology, University of Swabi, Swabi, 23340, Khyber Pakhtunkhwa, Pakistan
| | - Mian Sayed Khan
- Department of Zoology, University of Swabi, Swabi, 23340, Khyber Pakhtunkhwa, Pakistan
| | - Vasyl V Tkach
- Department of Biology, University of North Dakota, Grand Forks, ND, 58202-9019, USA.
| | - Nehaz Muhammad
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046, Gansu, People's Republic of China
| | - Dong Zhang
- Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture, and State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, Hubei, People's Republic of China
| | - Xing-Quan Zhu
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046, Gansu, People's Republic of China. .,Jiangsu Co-innovation Center for the Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University College of Veterinary Medicine, Yangzhou, 225009, Jiangsu, People's Republic of China.
| | - Jun Ma
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046, Gansu, People's Republic of China.
| |
Collapse
|
32
|
Common Structural Patterns in the Maxicircle Divergent Region of Trypanosomatidae. Pathogens 2020; 9:pathogens9020100. [PMID: 32033466 PMCID: PMC7169413 DOI: 10.3390/pathogens9020100] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 02/01/2020] [Accepted: 02/03/2020] [Indexed: 12/29/2022] Open
Abstract
Maxicircles of all kinetoplastid flagellates are functional analogs of mitochondrial genome of other eukaryotes. They consist of two distinct parts, called the coding region and the divergent region (DR). The DR is composed of highly repetitive sequences and, as such, remains the least explored segment of a trypanosomatid genome. It is extremely difficult to sequence and assemble, that is why very few full length maxicircle sequences were available until now. Using PacBio data, we assembled 17 complete maxicircles from different species of trypanosomatids. Here we present their large-scale comparative analysis and describe common patterns of DR organization in trypanosomatids.
Collapse
|
33
|
McDew-White M, Li X, Nkhoma SC, Nair S, Cheeseman I, Anderson TJC. Mode and Tempo of Microsatellite Length Change in a Malaria Parasite Mutation Accumulation Experiment. Genome Biol Evol 2020; 11:1971-1985. [PMID: 31273388 PMCID: PMC6644851 DOI: 10.1093/gbe/evz140] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/29/2019] [Indexed: 12/12/2022] Open
Abstract
Malaria parasites have small extremely AT-rich genomes: microsatellite repeats (1–9 bp) comprise 11% of the genome and genetic variation in natural populations is dominated by repeat changes in microsatellites rather than point mutations. This experiment was designed to quantify microsatellite mutation patterns in Plasmodium falciparum. We established 31 parasite cultures derived from a single parasite cell and maintained these for 114–267 days with frequent reductions to a single cell, so parasites accumulated mutations during ∼13,207 cell divisions. We Illumina sequenced the genomes of both progenitor and end-point mutation accumulation (MA) parasite lines in duplicate to validate stringent calling parameters. Microsatellite calls were 99.89% (GATK), 99.99% (freeBayes), and 99.96% (HipSTR) concordant in duplicate sequence runs from independent sequence libraries, whereas introduction of microsatellite mutations into the reference genome revealed a low false negative calling rate (0.68%). We observed 98 microsatellite mutations. We highlight several conclusions: microsatellite mutation rates (3.12 × 10−7 to 2.16 × 10−8/cell division) are associated with both repeat number and repeat motif like other organisms studied. However, 41% of changes resulted from loss or gain of more than one repeat: this was particularly true for long repeat arrays. Unlike other eukaryotes, we found no insertions or deletions that were not associated with repeats or homology regions. Overall, microsatellite mutation rates are among the lowest recorded and comparable to those in another AT-rich protozoan (Dictyostelium). However, a single infection (>1011 parasites) will still contain over 2.16 × 103 to 3.12 × 104 independent mutations at any single microsatellite locus.
Collapse
Affiliation(s)
| | - Xue Li
- Texas Biomedical Research Institute, San Antonio, Texas
| | - Standwell C Nkhoma
- Texas Biomedical Research Institute, San Antonio, Texas.,Malaria Research and Reference Reagent Resource Center (MR4), BEI Resources, American Type Culture Collection, 10801 University Boulevard, Manassas, VA
| | - Shalini Nair
- Texas Biomedical Research Institute, San Antonio, Texas
| | - Ian Cheeseman
- Texas Biomedical Research Institute, San Antonio, Texas
| | | |
Collapse
|
34
|
Genovese LM, Mosca MM, Pellegrini M, Geraci F. Dot2dot: accurate whole-genome tandem repeats discovery. Bioinformatics 2019; 35:914-922. [PMID: 30165507 PMCID: PMC6419916 DOI: 10.1093/bioinformatics/bty747] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 08/03/2018] [Accepted: 08/24/2018] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Large-scale sequencing projects have confirmed the hypothesis that eukaryotic DNA is rich in repetitions whose functional role needs to be elucidated. In particular, tandem repeats (TRs) (i.e. short, almost identical sequences that lie adjacent to each other) have been associated to many cellular processes and, indeed, are also involved in several genetic disorders. The need of comprehensive lists of TRs for association studies and the absence of a computational model able to capture their variability have revived research on discovery algorithms. RESULTS Building upon the idea that sequence similarities can be easily displayed using graphical methods, we formalized the structure that TRs induce in dot-plot matrices where a sequence is compared with itself. Leveraging on the observation that a compact representation of these matrices can be built and searched in linear time, we developed Dot2dot: an accurate algorithm fast enough to be suitable for whole-genome discovery of TRs. Experiments on five manually curated collections of TRs have shown that Dot2dot is more accurate than other established methods, and completes the analysis of the biggest known reference genome in about one day on a standard PC. AVAILABILITY AND IMPLEMENTATION Source code and datasets are freely available upon paper acceptance at the URL: https://github.com/Gege7177/Dot2dot. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Marco M Mosca
- Department of Computer Science, University of Liverpool, Liverpool, UK
| | - Marco Pellegrini
- Institute for Informatics and Telematics, CNR, Pisa, Italy.,Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, Pisa, Italy
| | - Filippo Geraci
- Institute for Informatics and Telematics, CNR, Pisa, Italy
| |
Collapse
|
35
|
Makhortykh SA, Kulikova LI, Pankratov AN, Tetuev RK. Generalized Spectral-Analytical Method and Its Applications in Image Analysis and Pattern Recognition Problems. PATTERN RECOGNITION AND IMAGE ANALYSIS 2019. [DOI: 10.1134/s1054661819040102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
36
|
Pagès G, Grudinin S. DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures. Bioinformatics 2019; 35:5113-5120. [PMID: 31161198 DOI: 10.1093/bioinformatics/btz454] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 04/16/2019] [Accepted: 05/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Thanks to the recent advances in structural biology, nowadays 3D structures of various proteins are solved on a routine basis. A large portion of these structures contain structural repetitions or internal symmetries. To understand the evolution mechanisms of these proteins and how structural repetitions affect the protein function, we need to be able to detect such proteins very robustly. As deep learning is particularly suited to deal with spatially organized data, we applied it to the detection of proteins with structural repetitions. RESULTS We present DeepSymmetry, a versatile method based on 3D convolutional networks that detects structural repetitions in proteins and their density maps. Our method is designed to identify tandem repeat proteins, proteins with internal symmetries, symmetries in the raw density maps, their symmetry order and also the corresponding symmetry axes. Detection of symmetry axes is based on learning 6D Veronese mappings of 3D vectors, and the median angular error of axis determination is less than one degree. We demonstrate the capabilities of our method on benchmarks with tandem-repeated proteins and also with symmetrical assemblies. For example, we have discovered about 7800 putative tandem repeat proteins in the PDB. AVAILABILITY AND IMPLEMENTATION The method is available at https://team.inria.fr/nano-d/software/deepsymmetry. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the DeepSymmetry model to these maps. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guillaume Pagès
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Sergei Grudinin
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
37
|
Chen X, Dong Z, Liu G, He J, Zhao R, Wang W, Peng Y, Li X. Phylogenetic analysis provides insights into the evolution of Asian fireflies and adult bioluminescence. Mol Phylogenet Evol 2019; 140:106600. [PMID: 31445200 DOI: 10.1016/j.ympev.2019.106600] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 08/09/2019] [Accepted: 08/20/2019] [Indexed: 02/04/2023]
Abstract
Fireflies are one of the best-known examples of luminescent organisms. The limited geographic distribution and rarity of some firefly genera have hindered molecular phylogenetic analysis, resulting in uncertainty in regard to firefly phylogeny. Here, using genome skimming next-generation sequencing, we sequenced 23 Asian firefly species from 15 genera (Lampyridae: 14; Rhagophthalmidae: one) and assembled their mitochondrial genomes (mitogenomes) and nuclear ribosomal DNA (rDNA) repeat unit. The mitogenomes (including 15 mitochondrial genes: COX1-3, ATP6&8, ND1-6&4L, CYTB, 12S, and 16S) were recovered for almost all 23 species; furthermore, three regions of the nuclear rDNA repeat unit (18S, 28S, and 5.8S) were recovered for 22 out of the 23 species. The mitogenomes of 11 genera and 22 species as well as the complete rDNA from 22 species are reported here for the first time. Combined with previously published sequences of mitochondrial and rDNA coding regions, 166 species (170 populations with four overlapping in Lampyridae) were included in the current analyses. We selected different species groups and coding regions to infer phylogenies, and then employed tree certainty (TC) and internode certainty (IC) to quantify any phylogenetic incongruence. Phylogenetic analysis of 18 coding regions (15 mitochondrial genes and three regions of the nuclear rDNA repeat unit) from different species groups showed that the 144-species selection group (excluding 22 species outside Lampyridae) had relatively high TC (101.39). Further phylogenetic analysis of the 144 species using different coding regions indicated that the phylogeny of the 13 coding regions (10 mitochondrial genes: COX1-2, ATP6&8, ND1, ND4-5, CYTB, 12S and 16S; three rDNA regions: 18S, 5.8S, and 28S) demonstrated higher TC (103.02) than the phylogenies based on the 18 coding regions (TC = 101.39), conserved-regions (c-regions, i.e., 12S, 16S, COX1, 18S, and 28S) (TC = 95.11), or conserved-sites (c-sites, TC = 92.31) for the mitochondrial genes. In contrast, the c-sites strengthened the deeper nodes of the 144-species phylogeny compared to the c-regions. All of the 144-species phylogenies using different coding regions (except the c-regions) consistently recovered the monophyly of each of the three luminous families and their combination (Lampyridae, Rhagophthalmidae, and Phengodidae) with high IC support. Our phylogenetic analyses clarified the position of firefly genera Lamprigera, Vesta, Stenocladius, Pyrocoelia, Diaphanes, Abscondita, Pygoluciola, Emeia, Pristolycus, and Menghuoius. We also inferred the evolutionary pattern of adult bioluminescence in Lampyridae based on the phylogenies of 166 and 144 species. Our data suggest that the common ancestor of Lampyridae possessed adult bioluminescence, with a higher loss rate than gain rate of bioluminescence during its lineage evolution. Our results provide insight into Asian firefly phylogeny, and also enrich mitogenome and rDNA data resources for further study.
Collapse
Affiliation(s)
- Xing Chen
- CAS Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, Yunnan 666303, China
| | - Zhiwei Dong
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Guichun Liu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Center for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Jinwu He
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Center for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Ruoping Zhao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Wen Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Center for Excellence in Animal Evolution and Genetics, Kunming, Yunnan 650223, China; Center for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China.
| | - Yanqiong Peng
- CAS Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, Yunnan 666303, China.
| | - Xueyan Li
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China.
| |
Collapse
|
38
|
Bi WX, He* JW, Chen CC, Kundrata R, Li XY. Sinopyrophorinae, a new subfamily of Elateridae (Coleoptera, Elateroidea) with the first record of a luminous click beetle in Asia and evidence for multiple origins of bioluminescence in Elateridae. Zookeys 2019; 864:79-97. [PMID: 31363346 PMCID: PMC6656784 DOI: 10.3897/zookeys.864.26689] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 05/15/2019] [Indexed: 11/17/2022] Open
Abstract
The new subfamily Sinopyrophorinae within Elateridae is proposed to accommodate a bioluminescent species, Sinopyrophorusschimmeli Bi & Li, gen. et sp. nov., recently discovered in Yunnan, China. This lineage is morphologically distinguished from other click-beetle subfamilies by the strongly protruding frontoclypeal region, which is longitudinally carinate medially, the pretarsal claws without basal setae, the hind wing venation with a well-defined wedge cell, the abdomen with seven (male) or six (female) ventrites, the large luminous organ on the abdominal sternite II, and the male genitalia with median lobe much shorter than parameres, and parameres arcuate, with the inner margin near its apical third dentate. Molecular phylogeny based on the combined 14 mitochondrial and two nuclear genes supports the placement of this taxon far from other luminescent click-beetle groups, which provides additional evidence for the multiple origin of bioluminescence in Elateridae. Illustrations of habitus and main diagnostic features of S.schimmeli Bi & Li, gen. et sp. nov. are provided, as well as the brief description of its luminescent behavior.
Collapse
Affiliation(s)
- Wen-Xuan Bi
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- Room 401, No. 2, Lane 155, Lianhua South Road, Shanghai, 201100, China
| | - Jin-Wu He*
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Chang-Chin Chen
- NPS office, Tianjin New Wei San Industrial Company, Ltd., Tianjing, China
| | - Robin Kundrata
- Department of Zoology, Faculty of Science, Palacky University, 17. listopadu 50, 77146, Olomouc, Czech Republic
| | - Xue-Yan Li
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
39
|
IDSSR: An Efficient Pipeline for Identifying Polymorphic Microsatellites from a Single Genome Sequence. Int J Mol Sci 2019; 20:ijms20143497. [PMID: 31315288 PMCID: PMC6678329 DOI: 10.3390/ijms20143497] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 06/25/2019] [Accepted: 07/15/2019] [Indexed: 12/02/2022] Open
Abstract
Simple sequence repeats (SSRs) are known as microsatellites, and consist of tandem 1–6-base motifs. They have become one of the most popular molecular markers, and are widely used in molecular ecology, conservation biology, molecular breeding, and many other fields. Previously reported methods identify monomorphic and polymorphic SSRs and determine the polymorphic SSRs via experimental validation, which is potentially time-consuming and costly. Herein, we present a new strategy named insertion/deletion (INDEL) SSR (IDSSR) to identify polymorphic SSRs by integrating SSRs with nucleotide insertions/deletions (INDEL) solely based on a single genome sequence and the sequenced pair-end reads. These INDEL indexes and polymorphic SSRs were identified, as well as the number of repeats, repeat motifs, chromosome location, annealing temperature, and primer sequences, enabling future experimental approaches to determine the correctness and polymorphism. Experimental validation with the giant panda demonstrated that our method has high reliability and stability. The efficient SSR pipeline would help researchers obtain high-quality genetic markers for plants and animals of interest, save labor, and reduce costly marker-screening experiments. IDSSR is freely available at https://github.com/Allsummerking/IDSSR.
Collapse
|
40
|
Shamanskiy VA, Timonina VN, Popadin KY, Gunbin KV. ImtRDB: a database and software for mitochondrial imperfect interspersed repeats annotation. BMC Genomics 2019; 20:295. [PMID: 31284879 PMCID: PMC6614062 DOI: 10.1186/s12864-019-5536-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Mitochondria is a powerhouse of all eukaryotic cells that have its own circular DNA (mtDNA) encoding various RNAs and proteins. Somatic perturbations of mtDNA are accumulating with age thus it is of great importance to uncover the main sources of mtDNA instability. Recent analyses demonstrated that somatic mtDNA deletions depend on imperfect repeats of various nature between distant mtDNA segments. However, till now there are no comprehensive databases annotating all types of imperfect repeats in numerous species with sequenced complete mitochondrial genome as well as there are no algorithms capable to call all types of imperfect repeats in circular mtDNA. RESULTS We implemented naïve algorithm of pattern recognition by analogy to standard dot-plot construction procedures allowing us to find both perfect and imperfect repeats of four main types: direct, inverted, mirror and complementary. Our algorithm is adapted to specific characteristics of mtDNA such as circularity and an excess of short repeats - it calls imperfect repeats starting from the length of 10 b.p. We constructed interactive web available database ImtRDB depositing perfect and imperfect repeats positions in mtDNAs of more than 3500 Vertebrate species. Additional tools, such as visualization of repeats within a genome, comparison of repeat densities among different genomes and a possibility to download all results make this database useful for many biologists. Our first analyses of the database demonstrated that mtDNA imperfect repeats (i) are usually short; (ii) associated with unfolded DNA structures; (iii) four types of repeats positively correlate with each other forming two equivalent pairs: direct and mirror versus inverted and complementary, with identical nucleotide content and similar distribution between species; (iv) abundance of repeats is negatively associated with GC content; (v) dinucleotides GC versus CG are overrepresented on light chain of mtDNA covered by repeats. CONCLUSIONS ImtRDB is available at http://bioinfodbs.kantiana.ru/ImtRDB/ . It is accompanied by the software calling all types of interspersed repeats with different level of degeneracy in circular DNA. This database and software can become a very useful tool in various areas of mitochondrial and chloroplast DNA research.
Collapse
Affiliation(s)
- Viktor A Shamanskiy
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Valeria N Timonina
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Konstantin Yu Popadin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Konstantin V Gunbin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia. .,Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia.
| |
Collapse
|
41
|
Characterization of the complete mitochondrial genome of Plagiorchis maculosus (Digenea, Plagiorchiidae), Representative of a taxonomically complex digenean family. Parasitol Int 2019; 71:99-105. [PMID: 30946896 DOI: 10.1016/j.parint.2019.04.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 01/28/2023]
Abstract
Despite the highly divergent morphology, pathogenicity and worldwide distribution of digenean parasites belonging to one of the largest families, the Plagiorchiidae, there are no complete mitochondrial (mt) genomes published to date for plagiorchiids. In this study, we obtained nuclear ribosomal DNA (ITS region and 28S rDNA) sequences and the complete mt genome sequences of Plagiorchis maculosus (Rudolphi 1802) Braun, 1902, and assessed its phylogenetic relationship with other xiphidiates, based on the mtDNA sequences. The obtained ITS and 28S rDNA sequences were identical to the corresponding sequences of P. maculosus available in GenBank. The complete mitochondrial genome of P. maculosus (14,124 bp) contained 36 genes (atp8 is absent) and a long non-coding region (NCR) with two sets of repeated sequences of 283 nucleotides each. The phylogenetic tree resulting from Bayesian inference (BI) analyses based on concatenated nucleotide sequences of all 36 genes of P. maculosus and other xiphidiates mitochondrial genomes, indicated that P. maculosus (and the Plagiorchiidae) is phylogenetically closest to the Brachycladiidae and Paragonimidae. The present study describes the first mitochondrial genome from the type genus of the family Plagiorchiidae. The overall gene arrangement, nucleotide composition, A + T contents, AT and GC skew and codon usage with relative synonymous codon usage (RSCU) for 12 PCGs are described. Characterization of mitochondrial genomes from additional plagiorchiid taxa is necessary to make further progress in phylogenetic and epidemiological studies of these digeneans as well as accurate diagnostics of these parasites including those parasitic in humans.
Collapse
|
42
|
Manthey JD, Moyle RG, Boissinot S. Multiple and Independent Phases of Transposable Element Amplification in the Genomes of Piciformes (Woodpeckers and Allies). Genome Biol Evol 2018; 10:1445-1456. [PMID: 29850797 PMCID: PMC6007501 DOI: 10.1093/gbe/evy105] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2018] [Indexed: 12/15/2022] Open
Abstract
The small and conserved genomes of birds are likely a result of flight-related metabolic constraints. Recombination-driven deletions and minimal transposable element (TE) expansions have led to continually shrinking genomes during evolution of many lineages of volant birds. Despite constraints of genome size in birds, we identified multiple waves of amplification of TEs in Piciformes (woodpeckers, honeyguides, toucans, and barbets). Relative to other bird species’ genomic TE abundance (< 10% of genome), we found ∼17–30% TE content in multiple clades within Piciformes. Several families of the retrotransposon superfamily chicken repeat 1 (CR1) expanded in at least three different waves of activity. The most recent CR1 expansions (∼4–7% of genome) preceded bursts of diversification in the woodpecker clade and in the American barbets + toucans clade. Additionally, we identified several thousand polymorphic CR1 insertions (hundreds per individual) in three closely related woodpecker species. Woodpecker CR1 insertion polymorphisms are maintained at lower frequencies than single nucleotide polymorphisms indicating that purifying selection is acting against additional CR1 copies and that these elements impose a fitness cost on their host. These findings provide evidence of large scale and ongoing TE activity in avian genomes despite continual constraint on genome size.
Collapse
Affiliation(s)
- Joseph D Manthey
- New York University Abu Dhabi, UAE.,Department of Biological Sciences, Texas Tech University
| | - Robert G Moyle
- Department of Ecology and Evolutionary Biology, Biodiversity Institute, University of Kansas
| | | |
Collapse
|
43
|
Pickett BD, Miller JB, Ridge PG. Kmer-SSR: a fast and exhaustive SSR search algorithm. Bioinformatics 2018; 33:3922-3928. [PMID: 28968741 PMCID: PMC5860095 DOI: 10.1093/bioinformatics/btx538] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 08/29/2017] [Indexed: 11/30/2022] Open
Abstract
Motivation One of the main challenges with bioinformatics software is that the size and complexity of datasets necessitate trading speed for accuracy, or completeness. To combat this problem of computational complexity, a plethora of heuristic algorithms have arisen that report a ‘good enough’ solution to biological questions. However, in instances such as Simple Sequence Repeats (SSRs), a ‘good enough’ solution may not accurately portray results in population genetics, phylogenetics and forensics, which require accurate SSRs to calculate intra- and inter-species interactions. Results We present Kmer-SSR, which finds all SSRs faster than most heuristic SSR identification algorithms in a parallelized, easy-to-use manner. The exhaustive Kmer-SSR option has 100% precision and 100% recall and accurately identifies every SSR of any specified length. To identify more biologically pertinent SSRs, we also developed several filters that allow users to easily view a subset of SSRs based on user input. Kmer-SSR, coupled with the filter options, accurately and intuitively identifies SSRs quickly and in a more user-friendly manner than any other SSR identification algorithm. Availability and implementation The source code is freely available on GitHub at https://github.com/ridgelab/Kmer-SSR.
Collapse
|
44
|
Genovese LM, Geraci F, Corrado L, Mangano E, D'Aurizio R, Bordoni R, Severgnini M, Manzini G, De Bellis G, D'Alfonso S, Pellegrini M. A Census of Tandemly Repeated Polymorphic Loci in Genic Regions Through the Comparative Integration of Human Genome Assemblies. Front Genet 2018; 9:155. [PMID: 29770143 PMCID: PMC5941971 DOI: 10.3389/fgene.2018.00155] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 04/13/2018] [Indexed: 11/29/2022] Open
Abstract
Polymorphic Tandem Repeat (PTR) is a common form of polymorphism in the human genome. A PTR consists in a variation found in an individual (or in a population) of the number of repeating units of a Tandem Repeat (TR) locus of the genome with respect to the reference genome. Several phenotypic traits and diseases have been discovered to be strongly associated with or caused by specific PTR loci. PTR are further distinguished in two main classes: Short Tandem Repeats (STR) when the repeating unit has size up to 6 base pairs, and Variable Number Tandem Repeats (VNTR) for repeating units of size above 6 base pairs. As larger and larger populations are screened via high throughput sequencing projects, it becomes technically feasible and desirable to explore the association between PTR and a panoply of such traits and conditions. In order to facilitate these studies, we have devised a method for compiling catalogs of PTR from assembled genomes, and we have produced a catalog of PTR for genic regions (exons, introns, UTR and adjacent regions) of the human genome (GRCh38). We applied four different TR discovery software tools to uncover in the first phase 55,223,485 TR (after duplicate removal) in GRCh38, of which 373,173 were determined to be PTR in the second phase by comparison with five assembled human genomes. Of these, 263,266 are not included by state-of-the-art PTR catalogs. The new methodology is mainly based on a hierarchical and systematic application of alignment-based sequence comparisons to identify and measure the polymorphism of TR. While previous catalogs focus on the class of STR of small total size, we remove any size restrictions, aiming at the more general class of PTR, and we also target fuzzy TR by using specific detection tools. Similarly to other previous catalogs of human polymorphic loci, we focus our catalog toward applications in the discovery of disease-associated loci. Validation by cross-referencing with existing catalogs on common clinically-relevant loci shows good concordance. Overall, this proposed census of human PTR in genic regions is a shared resource (web accessible), complementary to existing catalogs, facilitating future genome-wide studies involving PTR.
Collapse
Affiliation(s)
| | - Filippo Geraci
- Institute for Informatics and Telematics of CNR, Pisa, Italy
| | - Lucia Corrado
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | | | - Roberta Bordoni
- Institute for Biomedical Technologies of CNR, Segrate, Italy
| | | | - Giovanni Manzini
- Institute for Informatics and Telematics of CNR, Pisa, Italy.,Department of Science and Technological Innovation, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | - Sandra D'Alfonso
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | |
Collapse
|
45
|
Beier S, Thiel T, Münch T, Scholz U, Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics 2018; 33:2583-2585. [PMID: 28398459 PMCID: PMC5870701 DOI: 10.1093/bioinformatics/btx198] [Citation(s) in RCA: 1045] [Impact Index Per Article: 174.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Accepted: 04/06/2017] [Indexed: 12/27/2022] Open
Abstract
Motivation Microsatellites are a widely-used marker system in plant genetics and forensics. The development of reliable microsatellite markers from resequencing data is challenging. Results We extended MISA, a computational tool assisting the development of microsatellite markers, and reimplemented it as a web-based application. We improved compound microsatellite detection and added the possibility to display and export MISA results in GFF3 format for downstream analysis. Availability and Implementation MISA-web can be accessed under http://misaweb.ipk-gatersleben.de/. The website provides tutorials, usage note as well as download links to the source code. Contact scholz@ipk-gatersleben.de.
Collapse
Affiliation(s)
- Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466 Seeland, Germany
| | - Thomas Thiel
- KWS Saat SE, Grimsehlstr. 31, 37555 Einbeck, Germany
| | - Thomas Münch
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466 Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466 Seeland, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466 Seeland, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Germany
| |
Collapse
|
46
|
McCormick RF, Truong SK, Sreedasyam A, Jenkins J, Shu S, Sims D, Kennedy M, Amirebrahimi M, Weers BD, McKinley B, Mattison A, Morishige DT, Grimwood J, Schmutz J, Mullet JE. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018; 93:338-354. [PMID: 29161754 DOI: 10.1111/tpj.13781] [Citation(s) in RCA: 262] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 11/05/2017] [Accepted: 11/14/2017] [Indexed: 05/20/2023]
Abstract
Sorghum bicolor is a drought tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass and a genetic model for C4 grasses due to its relatively small genome (approximately 800 Mbp), diploid genetics, diverse germplasm, and colinearity with other C4 grass genomes. In this study, deep sequencing, genetic linkage analysis, and transcriptome data were used to produce and annotate a high-quality reference genome sequence. Reference genome sequence order was improved, 29.6 Mbp of additional sequence was incorporated, the number of genes annotated increased 24% to 34 211, average gene length and N50 increased, and error frequency was reduced 10-fold to 1 per 100 kbp. Subtelomeric repeats with characteristics of Tandem Repeats in Miniature (TRIM) elements were identified at the termini of most chromosomes. Nucleosome occupancy predictions identified nucleosomes positioned immediately downstream of transcription start sites and at different densities across chromosomes. Alignment of more than 50 resequenced genomes from diverse sorghum genotypes to the reference genome identified approximately 7.4 M single nucleotide polymorphisms (SNPs) and 1.9 M indels. Large-scale variant features in euchromatin were identified with periodicities of approximately 25 kbp. A transcriptome atlas of gene expression was constructed from 47 RNA-seq profiles of growing and developed tissues of the major plant organs (roots, leaves, stems, panicles, and seed) collected during the juvenile, vegetative and reproductive phases. Analysis of the transcriptome data indicated that tissue type and protein kinase expression had large influences on transcriptional profile clustering. The updated assembly, annotation, and transcriptome data represent a resource for C4 grass research and crop improvement.
Collapse
Affiliation(s)
- Ryan F McCormick
- Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX, 77843, USA
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, 77843, USA
| | - Sandra K Truong
- Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX, 77843, USA
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, 77843, USA
| | | | - Jerry Jenkins
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Shengqiang Shu
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - David Sims
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Megan Kennedy
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | | | - Brock D Weers
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, 77843, USA
| | - Brian McKinley
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, 77843, USA
| | - Ashley Mattison
- Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX, 77843, USA
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, 77843, USA
| | - Daryl T Morishige
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, 77843, USA
| | - Jane Grimwood
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - Jeremy Schmutz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - John E Mullet
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, 77843, USA
| |
Collapse
|
47
|
Engelbrecht J, Duong TA, Berg NVD. New microsatellite markers for population studies of Phytophthora cinnamomi, an important global pathogen. Sci Rep 2017; 7:17631. [PMID: 29247246 PMCID: PMC5732169 DOI: 10.1038/s41598-017-17799-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 11/29/2017] [Indexed: 01/31/2023] Open
Abstract
Phytophthora cinnamomi is the causal agent of root rot, canker and dieback of thousands of plant species around the globe. This oomycete not only causes severe economic losses but also threatens natural ecosystems. In South Africa, P. cinnamomi affects eucalyptus, avocado, macadamia and indigenous fynbos. Despite being one of the most important plant pathogens with a global distribution, little information is available regarding origin, invasion history and population biology. This is partly due to the limited number of molecular markers available for studying P. cinnamomi. Using available genome sequences for three isolates of P. cinnamomi, sixteen polymorphic microsatellite markers were developed as a set of multiplexable markers for both PCR and Gene Scan assays. The application of these markers on P. cinnamomi populations from avocado production areas in South Africa revealed that they were all polymorphic in these populations. The markers developed in this study represent a valuable resource for studying the population biology and movement of P. cinnamomi and will aid in the understanding of the origin and invasion history of this important species.
Collapse
Affiliation(s)
- J Engelbrecht
- Department of Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, 0002, South Africa.
| | - T A Duong
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, 0002, South Africa
| | - N V D Berg
- Department of Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, 0002, South Africa
| |
Collapse
|
48
|
Shimizu T, Tanizawa Y, Mochizuki T, Nagasaki H, Yoshioka T, Toyoda A, Fujiyama A, Kaminuma E, Nakamura Y. Draft Sequencing of the Heterozygous Diploid Genome of Satsuma ( Citrus unshiu Marc.) Using a Hybrid Assembly Approach. Front Genet 2017; 8:180. [PMID: 29259619 PMCID: PMC5723288 DOI: 10.3389/fgene.2017.00180] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 11/06/2017] [Indexed: 12/19/2022] Open
Abstract
Satsuma (Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma ("Miyagawa Wase") was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome.
Collapse
Affiliation(s)
- Tokurou Shimizu
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Shimizu, Japan
| | - Yasuhiro Tanizawa
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Takako Mochizuki
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Hideki Nagasaki
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Terutaka Yoshioka
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Shimizu, Japan
| | - Atsushi Toyoda
- Comparative Genomics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Asao Fujiyama
- Comparative Genomics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Eli Kaminuma
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Yasukazu Nakamura
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| |
Collapse
|
49
|
Chen X, Wang J, Huang L, Yue W, Zou J, Yuan C, Lu G, Wang C. Evolutionary relationship of three mitten crabs ( Eriocheir sp) revealed by mitogenome and 5S ribosomal DNA analysis. AQUACULTURE AND FISHERIES 2017. [DOI: 10.1016/j.aaf.2017.10.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
50
|
Plasticity of the MFS1 Promoter Leads to Multidrug Resistance in the Wheat Pathogen Zymoseptoria tritici. mSphere 2017; 2:mSphere00393-17. [PMID: 29085913 PMCID: PMC5656749 DOI: 10.1128/msphere.00393-17] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 09/21/2017] [Indexed: 11/20/2022] Open
Abstract
The ascomycete Zymoseptoria tritici is the causal agent of Septoria leaf blotch on wheat. Disease control relies mainly on resistant wheat cultivars and on fungicide applications. The fungus displays a high potential to circumvent both methods. Resistance against all unisite fungicides has been observed over decades. A different type of resistance has emerged among wild populations with multidrug-resistant (MDR) strains. Active fungicide efflux through overexpression of the major facilitator gene MFS1 explains this emerging resistance mechanism. Applying a bulk-progeny sequencing approach, we identified in this study a 519-bp long terminal repeat (LTR) insert in the MFS1 promoter, a relic of a retrotransposon cosegregating with the MDR phenotype. Through gene replacement, we show the insert as a mutation responsible for MFS1 overexpression and the MDR phenotype. Besides this type I insert, we found two different types of promoter inserts in more recent MDR strains. Type I and type II inserts harbor potential transcription factor binding sites, but not the type III insert. Interestingly, all three inserts correspond to repeated elements present at different genomic locations in either IPO323 or other Z. tritici strains. These results underline the plasticity of repeated elements leading to fungicide resistance in Z. tritici and which contribute to its adaptive potential. IMPORTANCE Disease control through fungicides remains an important means to protect crops from fungal diseases and to secure the harvest. Plant-pathogenic fungi, especially Zymoseptoria tritici, have developed resistance against most currently used active ingredients, reducing or abolishing their efficacy. While target site modification is the most common resistance mechanism against single modes of action, active efflux of multiple drugs is an emerging phenomenon in fungal populations reducing additionally fungicides' efficacy in multidrug-resistant strains. We have investigated the mutations responsible for increased drug efflux in Z. tritici field strains. Our study reveals that three different insertions of repeated elements in the same promoter lead to multidrug resistance in Z. tritici. The target gene encodes the membrane transporter MFS1 responsible for drug efflux, with the promoter inserts inducing its overexpression. These results underline the plasticity of repeated elements leading to fungicide resistance in Z. tritici.
Collapse
|