1
|
Verbiest MA, Lundström O, Xia F, Baudis M, Bilgin Sonay T, Anisimova M. Short tandem repeat mutations regulate gene expression in colorectal cancer. Sci Rep 2024; 14:3331. [PMID: 38336885 PMCID: PMC10858039 DOI: 10.1038/s41598-024-53739-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/04/2024] [Indexed: 02/12/2024] Open
Abstract
Short tandem repeat (STR) mutations are prevalent in colorectal cancer (CRC), especially in tumours with the microsatellite instability (MSI) phenotype. While STR length variations are known to regulate gene expression under physiological conditions, the functional impact of STR mutations in CRC remains unclear. Here, we integrate STR mutation data with clinical information and gene expression data to study the gene regulatory effects of STR mutations in CRC. We confirm that STR mutability in CRC highly depends on the MSI status, repeat unit size, and repeat length. Furthermore, we present a set of 1244 putative expression STRs (eSTRs) for which the STR length is associated with gene expression levels in CRC tumours. The length of 73 eSTRs is associated with expression levels of cancer-related genes, nine of which are CRC-specific genes. We show that linear models describing eSTR-gene expression relationships allow for predictions of gene expression changes in response to eSTR mutations. Moreover, we found an increased mutability of eSTRs in MSI tumours. Our evidence of gene regulatory roles for eSTRs in CRC highlights a mostly overlooked way through which tumours may modulate their phenotypes. Future extensions of these findings could uncover new STR-based targets in the treatment of cancer.
Collapse
Affiliation(s)
- Max A Verbiest
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Oxana Lundström
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Feifei Xia
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tugce Bilgin Sonay
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Ecology, Evolution and Environmental Biology, Columbia University, New York, USA
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
2
|
Lundström OS, Adriaan Verbiest M, Xia F, Jam HZ, Zlobec I, Anisimova M, Gymrek M. WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans. J Mol Biol 2023; 435:168260. [PMID: 37678708 DOI: 10.1016/j.jmb.2023.168260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/29/2023] [Accepted: 08/29/2023] [Indexed: 09/09/2023]
Abstract
Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at https://webstr.ucsd.edu.
Collapse
Affiliation(s)
- Oxana Sachenkova Lundström
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Vildly AB, Kalmar, Sweden; Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland. https://twitter.com/merenlin
| | - Max Adriaan Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Feifei Xia
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland; Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland. https://twitter.com/Feifeix97
| | - Helyaneh Ziaei Jam
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Inti Zlobec
- Institute of Tissue Medicine and Pathology, University of Bern, Switzerland
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA; Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
3
|
Reinar WB, Tørresen OK, Nederbragt AJ, Matschiner M, Jentoft S, Jakobsen KS. Teleost genomic repeat landscapes in light of diversification rates and ecology. Mob DNA 2023; 14:14. [PMID: 37789366 PMCID: PMC10546739 DOI: 10.1186/s13100-023-00302-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 09/20/2023] [Indexed: 10/05/2023] Open
Abstract
Repetitive DNA make up a considerable fraction of most eukaryotic genomes. In fish, transposable element (TE) activity has coincided with rapid species diversification. Here, we annotated the repetitive content in 100 genome assemblies, covering the major branches of the diverse lineage of teleost fish. We investigated if TE content correlates with family level net diversification rates and found support for a weak negative correlation. Further, we demonstrated that TE proportion correlates with genome size, but not to the proportion of short tandem repeats (STRs), which implies independent evolutionary paths. Marine and freshwater fish had large differences in STR content, with the most extreme propagation detected in the genomes of codfish species and Atlantic herring. Such a high density of STRs is likely to increase the mutational load, which we propose could be counterbalanced by high fecundity as seen in codfishes and herring.
Collapse
Affiliation(s)
| | - Ole K Tørresen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Alexander J Nederbragt
- Department of Biosciences, University of Oslo, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Michael Matschiner
- Department of Biosciences, University of Oslo, Oslo, Norway
- University of Oslo, Natural History Museum, Oslo, Norway
| | - Sissel Jentoft
- Department of Biosciences, University of Oslo, Oslo, Norway
| | | |
Collapse
|
4
|
Apsley AT, Domico ER, Verbiest MA, Brogan CA, Buck ER, Burich AJ, Cardone KM, Stone WJ, Anisimova M, Vandenbergh DJ. A novel hypervariable variable number tandem repeat in the dopamine transporter gene ( SLC6A3). Life Sci Alliance 2023; 6:e202201677. [PMID: 36754567 PMCID: PMC9909461 DOI: 10.26508/lsa.202201677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 01/25/2023] [Accepted: 01/26/2023] [Indexed: 02/10/2023] Open
Abstract
The dopamine transporter gene, SLC6A3, has received substantial attention in genetic association studies of various phenotypes. Although some variable number tandem repeats (VNTRs) present in SLC6A3 have been tested in genetic association studies, results have not been consistent. VNTRs in SLC6A3 that have not been examined genetically were characterized. The Tandem Repeat Annotation Library was used to characterize the VNTRs of 64 unrelated long-read haplotype-phased SLC6A3 sequences. Sequence similarity of each repeat unit of the five VNTRs is reported, along with the correlations of SNP-SNP, SNP-VNTR, and VNTR-VNTR alleles across the gene. One of these VNTRs is a novel hyper-VNTR (hyVNTR) in intron 8 of SLC6A3, which contains a range of 3.4-133.4 repeat copies and has a consensus sequence length of 38 bp, with 82% G+C content. The 38-base repeat was predicted to form G-quadruplexes in silico and was confirmed by circular dichroism spectroscopy. In addition, this hyVNTR contains multiple putative binding sites for PRDM9, which, in combination with low levels of linkage disequilibrium around the hyVNTR, suggests it might be a recombination hotspot.
Collapse
Affiliation(s)
- Abner T Apsley
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
- The Molecular, Cellular and Integrative Biosciences Program, The Pennsylvania State University, State College, PA, USA
| | - Emma R Domico
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Max A Verbiest
- Institute of Computational Life Science, School of Life Sciences and Facility Management, Zürich University of Applied Sciences, Wädenswil, Switzerland
- Department of Molecular Life Sciences, Faculty of Science, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Carly A Brogan
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Evan R Buck
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Andrew J Burich
- Department of Information Science and Technologies - Applied Data Sciences, The Pennsylvania State University, State College, PA, USA
| | - Kathleen M Cardone
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Wesley J Stone
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Maria Anisimova
- Institute of Computational Life Science, School of Life Sciences and Facility Management, Zürich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David J Vandenbergh
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
- The Molecular, Cellular and Integrative Biosciences Program, The Pennsylvania State University, State College, PA, USA
- Institute of the Neurosciences, The Pennsylvania State University, State College, PA, USA
- The Bioinformatics and Genomics Program, The Pennsylvania State University, State College, PA, USA
| |
Collapse
|
5
|
Wang J, Qian J, Jiang Y, Chen X, Zheng B, Chen S, Yang F, Xu Z, Duan B. Comparative Analysis of Chloroplast Genome and New Insights Into Phylogenetic Relationships of Polygonatum and Tribe Polygonateae. FRONTIERS IN PLANT SCIENCE 2022; 13:882189. [PMID: 35812916 PMCID: PMC9263837 DOI: 10.3389/fpls.2022.882189] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 06/03/2022] [Indexed: 05/22/2023]
Abstract
Members of Polygonatum are perennial herbs that have been widely used in traditional Chinese medicine to invigorate Qi, moisten the lung, and benefit the kidney and spleen among patients. However, the phylogenetic relationships and intrageneric taxonomy within Polygonatum have long been controversial because of the complexity of their morphological variations and lack of high-resolution molecular markers. The chloroplast (cp) genome is an optimal model for deciphering phylogenetic relationships in related families. In the present study, the complete cp genome of 26 species of Trib. Polygonateae were de novo assembled and characterized; all species exhibited a conserved quadripartite structure, that is, two inverted repeats (IR) containing most of the ribosomal RNA genes, and two unique regions, large single sequence (LSC) and small single sequence (SSC). A total of 8 highly variable regions (rps16-trnQ-UUG, trnS-GCU-trnG-UCC, rpl32-trnL-UAG, matK-rps16, petA-psbJ, trnT-UGU-trnL-UAA, accD-psaI, and trnC-GCA-petN) that might be useful as potential molecular markers for identifying Polygonatum species were identified. The molecular clock analysis results showed that the divergence time of Polygonatum might occur at ∼14.71 Ma, and the verticillate leaf might be the ancestral state of this genus. Moreover, phylogenetic analysis based on 88 cp genomes strongly supported the monophyly of Polygonatum. The phylogenetic analysis also suggested that Heteropolygonatum may be the sister group of the Polygonatum, but the Disporopsis, Maianthemum, and Disporum may have diverged earlier. This study provides valuable information for further species identification, evolution, and phylogenetic research of Polygonatum.
Collapse
Affiliation(s)
- Jing Wang
- College of Pharmaceutical Science, Dali University, Dali, China
- Heilongjiang Key Laboratory of Plant Bioactive Substance Biosynthesis and Utilization, College of Life Science, Northeast Forestry University, Harbin, China
| | - Jun Qian
- College of Pharmaceutical Science, Dali University, Dali, China
| | - Yuan Jiang
- College of Pharmaceutical Science, Dali University, Dali, China
| | - Xiaochen Chen
- Heilongjiang Key Laboratory of Plant Bioactive Substance Biosynthesis and Utilization, College of Life Science, Northeast Forestry University, Harbin, China
| | - Baojiang Zheng
- Heilongjiang Key Laboratory of Plant Bioactive Substance Biosynthesis and Utilization, College of Life Science, Northeast Forestry University, Harbin, China
| | - Shilin Chen
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Fajian Yang
- Baoshan College of Traditional Chinese Medicine, Baoshan, China
| | - Zhichao Xu
- College of Pharmaceutical Science, Dali University, Dali, China
- Heilongjiang Key Laboratory of Plant Bioactive Substance Biosynthesis and Utilization, College of Life Science, Northeast Forestry University, Harbin, China
- *Correspondence: Zhichao Xu,
| | - Baozhong Duan
- College of Pharmaceutical Science, Dali University, Dali, China
- Baozhong Duan,
| |
Collapse
|
6
|
Cheon SH, Woo MA, Jo S, Kim YK, Kim KJ. The Chloroplast Phylogenomics and Systematics of Zoysia (Poaceae). PLANTS 2021; 10:plants10081517. [PMID: 34451562 PMCID: PMC8400354 DOI: 10.3390/plants10081517] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/13/2021] [Accepted: 07/22/2021] [Indexed: 11/16/2022]
Abstract
The genus Zoysia Willd. (Chloridoideae) is widely distributed from the temperate regions of Northeast Asia—including China, Japan, and Korea—to the tropical regions of Southeast Asia. Among these, four species—Zoysia japonica Steud., Zoysia sinica Hance, Zoysia tenuifolia Thiele, and Zoysia macrostachya Franch. & Sav.—are naturally distributed in the Korean Peninsula. In this study, we report the complete plastome sequences of these Korean Zoysia species (NCBI acc. nos. MF953592, MF967579~MF967581). The length of Zoysia plastomes ranges from 135,854 to 135,904 bp, and the plastomes have a typical quadripartite structure, which consists of a pair of inverted repeat regions (20,962~20,966 bp) separated by a large (81,348~81,392 bp) and a small (12,582~12,586 bp) single-copy region. In terms of gene order and structure, Zoysia plastomes are similar to the typical plastomes of Poaceae. The plastomes encode 110 genes, of which 76 are protein-coding genes, 30 are tRNA genes, and four are rRNA genes. Fourteen genes contain single introns and one gene has two introns. Three evolutionary hotspot spacer regions—atpB~rbcL, rps16~rps3, and rpl32~trnL-UAG—were recognized among six analyzed Zoysia species. The high divergences in the atpB~rbcL spacer and rpl16~rpl3 region are primarily due to the differences in base substitutions and indels. In contrast, the high divergence between rpl32~trnL-UAG spacers is due to a small inversion with a pair of 22 bp stem and an 11 bp loop. Simple sequence repeats (SSRs) were identified in 59 different locations in Z. japonica, 63 in Z. sinica, 62 in Z. macrostachya, and 63 in Z. tenuifolia plastomes. Phylogenetic analysis showed that the Zoysia (Zoysiinae) forms a monophyletic group, which is sister to Sporobolus (Sporobolinae), with 100% bootstrap support. Within the Zoysia clade, the relationship of (Z. sinica, Z japonica), (Z. tenuifolia, Z. matrella), (Z. macrostachya, Z. macrantha) was suggested.
Collapse
|
7
|
Raman G, Lee EM, Park S. Intracellular DNA transfer events restricted to the genus Convallaria within the Asparagaceae family: Possible mechanisms and potential as genetic markers for biographical studies. Genomics 2021; 113:2906-2918. [PMID: 34182083 DOI: 10.1016/j.ygeno.2021.06.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 05/18/2021] [Accepted: 06/23/2021] [Indexed: 10/21/2022]
Abstract
Intracellular gene transfer among plant genomes is a common phenomenon. Due to their high conservation and high plastid membrane integrity, chloroplast (cp) genomes incorporate foreign genetic material very rarely. Convallaria is a small monocotyledonous genus consisting of C. keiskei, C. majalis and C. montana. Here, we characterized, analyzed and identified 3.3 and 3.7 kb of mitochondrial DNA sequences in the plastome (MCP) of C. majalis and C. montana, respectively. We identified 6 bp and 23 bp direct repeats and mitochondrial pseudogenes, with rps3, rps19 and rpl10 identified in the MCP region. Additionally, we developed novel plastid molecular genetic markers to differentiate Convallaria spp. based on 21 populations. BEAST and biogeographical analyses suggested that Convallaria separated into Eurasian and North American lineages during the middle Pliocene and originated in East Asia. Vicariance in the genus was followed by dispersal into Europe and southeastern North America. These analyses indicate that the MCP event was restricted to the genus Convallaria of Asparagaceae, in contrast to similar events that occurred in its common ancestors with other families of land plants. However, further mitochondrial and population studies are necessary to understand the integration of the MCP region and gene flow in the genus Convallaria.
Collapse
Affiliation(s)
- Gurusamy Raman
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsan-buk 38541, Republic of Korea.
| | - Eun Mi Lee
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsan-buk 38541, Republic of Korea
| | - SeonJoo Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsan-buk 38541, Republic of Korea.
| |
Collapse
|
8
|
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing. Methods Mol Biol 2021; 2243:1-25. [PMID: 33606250 DOI: 10.1007/978-1-0716-1103-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Increasingly affordable sequencing technologies are revolutionizing the field of genomic medicine. It is now feasible to interrogate all major classes of variation in an individual across the entire genome for less than $1000 USD. While the generation of patient sequence information using these technologies has become routine, the analysis and interpretation of this data remains the greatest obstacle to widespread clinical implementation. This chapter summarizes the steps to identify, annotate, and prioritize variant information required for clinical report generation. We discuss methods to detect each variant class and describe strategies to increase the likelihood of detecting causal variant(s) in Mendelian disease. Lastly, we describe a sample workflow for synthesizing large amount of genetic information into concise clinical reports.
Collapse
|
9
|
Singh P, Nath R, Venkatesh V. Comparative Genome-Wide Characterization of Microsatellites in Candida albicans and Candida dubliniensis Leading to the Development of Species-Specific Marker. Public Health Genomics 2021; 24:1-13. [PMID: 33401274 DOI: 10.1159/000512087] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 09/30/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Microsatellites or simple sequence repeats (SSR) are related to genomic structure, function, and certain diseases of taxonomically different organisms. OBJECTIVE To characterize microsatellites in two closely related Candida species by searching and comparing 1-6 bp nucleotide motifs and utilizing them to develop species-specific markers. METHODS Whole-genome sequence was downloaded from the public domain, microsatellites were mined and analyzed, and primers were synthesized. RESULTS A total of 15,821 and 7,868 microsatellites, with mono-nucleotides (8,679) and trinucleotides (3,156) as most frequent microsatellites, were mined in Candida dubliniensis and Candida albicans, respectively. Chromosome size was found positively correlated with microsatellite number in both the species, whereas it was negatively correlated with the relative abundance and density of microsatellites. A number of unique motifs were also found in both the species. Overall, microsatellite frequencies of each chromosome in C. dubliniensis were higher than in C. albicans. CONCLUSION The features of microsatellite distribution in the two species' genomes revealed that it is probably not conserved in the genus Candida. Data generated in this article could be used for comparative genome mapping and understanding the distribution of microsatellites and genome structure between these closely related and phenotypically misidentified species and may provide a foundation for the development of a new set of species-specific microsatellite markers. Here, we also report a novel microsatellite-based marker for C. dubliniensis-specific identification.
Collapse
Affiliation(s)
- Pallavi Singh
- Department of Biotechnology, Dr. A.P.J. Abdul Kalam Technical University, Lucknow, India, .,Department of Computer Science & Engineering, UIET, CSJM University, Kanpur, India,
| | - Ravindra Nath
- Department of Computer Science & Engineering, UIET, CSJM University, Kanpur, India
| | - Vimala Venkatesh
- Department of Microbiology, King George's Medical University, Lucknow, India
| |
Collapse
|
10
|
Raman G, Park KT, Kim JH, Park S. Characteristics of the completed chloroplast genome sequence of Xanthium spinosum: comparative analyses, identification of mutational hotspots and phylogenetic implications. BMC Genomics 2020; 21:855. [PMID: 33267775 PMCID: PMC7709266 DOI: 10.1186/s12864-020-07219-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 11/09/2020] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND The invasive species Xanthium spinosum has been used as a traditional Chinese medicine for many years. Unfortunately, no extensive molecular studies of this plant have been conducted. RESULTS Here, the complete chloroplast (cp) genome sequence of X. spinosum was assembled and analyzed. The cp genome of X. spinosum was 152,422 base pairs (bp) in length, with a quadripartite circular structure. The cp genome contained 115 unique genes, including 80 PCGs, 31 tRNA genes, and 4 rRNA genes. Comparative analyses revealed that X. spinosum contains a large number of repeats (999 repeats) and 701 SSRs in its cp genome. Fourteen divergences (Π > 0.03) were found in the intergenic spacer regions. Phylogenetic analyses revealed that Parthenium is a sister clade to both Xanthium and Ambrosia and an early-diverging lineage of subtribe Ambrosiinae, although this finding was supported with a very weak bootstrap value. CONCLUSION The identified hotspot regions could be used as molecular markers for resolving phylogenetic relationships and species identification in the genus Xanthium.
Collapse
Affiliation(s)
- Gurusamy Raman
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea, 38541
| | - Kyu Tae Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea, 38541
| | - Joo-Hwan Kim
- Department of Life Science, Gachon University, Seongnam, Gyeonggi-do, Republic of Korea
| | - SeonJoo Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea, 38541.
| |
Collapse
|
11
|
Liu Q, Li X, Li M, Xu W, Schwarzacher T, Heslop-Harrison JS. Comparative chloroplast genome analyses of Avena: insights into evolutionary dynamics and phylogeny. BMC PLANT BIOLOGY 2020; 20:406. [PMID: 32878602 PMCID: PMC7466839 DOI: 10.1186/s12870-020-02621-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 08/25/2020] [Indexed: 05/19/2023]
Abstract
BACKGROUND Oat (Avena sativa L.) is a recognized health-food, and the contributions of its different candidate A-genome progenitor species remain inconclusive. Here, we report chloroplast genome sequences of eleven Avena species, to examine the plastome evolutionary dynamics and analyze phylogenetic relationships between oat and its congeneric wild related species. RESULTS The chloroplast genomes of eleven Avena species (size range of 135,889-135,998 bp) share quadripartite structure, comprising of a large single copy (LSC; 80,014-80,132 bp), a small single copy (SSC; 12,575-12,679 bp) and a pair of inverted repeats (IRs; 21,603-21,614 bp). The plastomes contain 131 genes including 84 protein-coding genes, eight ribosomal RNAs and 39 transfer RNAs. The nucleotide sequence diversities (Pi values) range from 0.0036 (rps19) to 0.0093 (rpl32) for ten most polymorphic genes and from 0.0084 (psbH-petB) to 0.0240 (petG-trnW-CCA) for ten most polymorphic intergenic regions. Gene selective pressure analysis shows that all protein-coding genes have been under purifying selection. The adjacent position relationships between tandem repeats, insertions/deletions and single nucleotide polymorphisms support the evolutionary importance of tandem repeats in causing plastome mutations in Avena. Phylogenomic analyses, based on the complete plastome sequences and the LSC intermolecular recombination sequences, support the monophyly of Avena with two clades in the genus. CONCLUSIONS Diversification of Avena plastomes is explained by the presence of highly diverse genes and intergenic regions, LSC intermolecular recombination, and the co-occurrence of tandem repeat and indels or single nucleotide polymorphisms. The study demonstrates that the A-genome diploid-polyploid lineage maintains two subclades derived from different maternal ancestors, with A. longiglumis as the first diverging species in clade I. These genome resources will be helpful in elucidating the chloroplast genome structure, understanding the evolutionary dynamics at genus Avena and family Poaceae levels, and are potentially useful to exploit plastome variation in making hybrids for plant breeding.
Collapse
Affiliation(s)
- Qing Liu
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization / Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China.
- Center for Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, China.
| | - Xiaoyu Li
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization / Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingzhi Li
- Independent Researcher, Guangzhou, China
| | - Wenkui Xu
- Independent Researcher, Guangzhou, China
| | - Trude Schwarzacher
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization / Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China
- Department of Genetics and Genome Biology, University of Leicester, Leicester, LE1 7RH, UK
| | - John Seymour Heslop-Harrison
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization / Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China.
- Department of Genetics and Genome Biology, University of Leicester, Leicester, LE1 7RH, UK.
| |
Collapse
|
12
|
Zhang Z, Qu C, Zhang K, He Y, Zhao X, Yang L, Zheng Z, Ma X, Wang X, Wang W, Wang K, Li D, Zhang L, Zhang X, Su D, Chang X, Zhou M, Gao D, Jiang W, Leliaert F, Bhattacharya D, De Clerck O, Zhong B, Miao J. Adaptation to Extreme Antarctic Environments Revealed by the Genome of a Sea Ice Green Alga. Curr Biol 2020; 30:3330-3341.e7. [PMID: 32619486 DOI: 10.1016/j.cub.2020.06.029] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Revised: 05/13/2020] [Accepted: 06/08/2020] [Indexed: 01/21/2023]
Abstract
The unicellular green alga Chlamydomonas sp. ICE-L thrives in polar sea ice, where it tolerates extreme low temperatures, high salinity, and broad seasonal fluctuations in light conditions. Despite the high interest in biotechnological uses of this species, little is known about the adaptations that allow it to thrive in this harsh and complex environment. Here, we assembled a high-quality genome sequence of ∼542 Mb and found that retrotransposon proliferation contributed to the relatively large genome size of ICE-L when compared to other chlorophytes. Genomic features that may support the extremophilic lifestyle of this sea ice alga include massively expanded gene families involved in unsaturated fatty acid biosynthesis, DNA repair, photoprotection, ionic homeostasis, osmotic homeostasis, and reactive oxygen species detoxification. The acquisition of multiple ice binding proteins through putative horizontal gene transfer likely contributed to the origin of the psychrophilic lifestyle in ICE-L. Additional innovations include the significant upregulation under abiotic stress of several expanded ICE-L gene families, likely reflecting adaptive changes among diverse metabolic processes. Our analyses of the genome, transcriptome, and functional assays advance general understanding of the Antarctic green algae and offer potential explanations for how green plants adapt to extreme environments.
Collapse
Affiliation(s)
- Zhenhua Zhang
- College of Life Sciences, Nanjing Normal University, 210023 Nanjing, China
| | - Changfeng Qu
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China; Laboratory for Marine Drugs and Bioproducts of Qingdao National Laboratory for Marine Science and Technology, 266237 Qingdao, China
| | - Kaijian Zhang
- Novogene Bioinformatics Institute, 100083 Beijing, China
| | - Yingying He
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China
| | - Xing Zhao
- Novogene Bioinformatics Institute, 100083 Beijing, China
| | - Lingxiao Yang
- College of Life Sciences, Nanjing Normal University, 210023 Nanjing, China
| | - Zhou Zheng
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China; Laboratory for Marine Drugs and Bioproducts of Qingdao National Laboratory for Marine Science and Technology, 266237 Qingdao, China
| | - Xiaoya Ma
- College of Life Sciences, Nanjing Normal University, 210023 Nanjing, China
| | - Xixi Wang
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China
| | - Wenyu Wang
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China
| | - Kai Wang
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China
| | - Dan Li
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China
| | - Liping Zhang
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China
| | - Xin Zhang
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China
| | - Danyan Su
- College of Life Sciences, Nanjing Normal University, 210023 Nanjing, China
| | - Xin Chang
- College of Life Sciences, Nanjing Normal University, 210023 Nanjing, China
| | - Mengyan Zhou
- Novogene Bioinformatics Institute, 100083 Beijing, China
| | - Dan Gao
- Novogene Bioinformatics Institute, 100083 Beijing, China
| | - Wenkai Jiang
- Novogene Bioinformatics Institute, 100083 Beijing, China
| | - Frederik Leliaert
- Biology Department, Ghent University, 9000 Ghent, Belgium; Meise Botanic Garden, Nieuwelaan 38, 1860 Meise, Belgium
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
| | | | - Bojian Zhong
- College of Life Sciences, Nanjing Normal University, 210023 Nanjing, China.
| | - Jinlai Miao
- First Institute of Oceanography, Ministry of Natural Resources, 266061 Qingdao, China; Laboratory for Marine Drugs and Bioproducts of Qingdao National Laboratory for Marine Science and Technology, 266237 Qingdao, China.
| |
Collapse
|
13
|
Genomewide analysis of microsatellite markers based on sequenced database in two anuran species. J Genet 2020. [DOI: 10.1007/s12041-020-01222-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
14
|
Song X, Yang T, Yan X, Zheng F, Xu X, Zhou C. Comparison of microsatellite distribution patterns in twenty-nine beetle genomes. Gene 2020; 757:144919. [PMID: 32603771 DOI: 10.1016/j.gene.2020.144919] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 06/15/2020] [Accepted: 06/20/2020] [Indexed: 01/20/2023]
Abstract
Simple sequence repeats (SSRs) represent an important source of genetic variation that provides a basis for adaptation to different environments in organisms. In this study, we examined the distribution patterns of SSRs in twenty-nine beetle genomes and carried out Gene Ontology (GO) analysis of CDSs embedded with perfect SSRs (P-SSRs). The results demonstrated that imperfect SSRs (I-SSRs) represented the most abundant SSR category in beetle genomes and in different genomic regions (CDS, exon, and intron regions). The numbers of P-SSRs, I-SSRs, compound SSRs, and variable number tandem repeats were positively correlated with beetle genome size, whereas neither the frequency nor the density of the SSRs was correlated with genome size. Moreover, our results demonstrated that common genomic features of P-SSRs within the same suborder or family of Coleoptera were rare. Mono-, di-, tri-, or tetranucleotide SSRs were the most abundant P-SSR categories in beetle genomes. The preferred predominant repeat motif among the mononucleotide P-SSRs was (A)n, but the most frequent repeat motifs for other length classes varied differentially among these genomes. Furthermore, the P-SSR type with the highest GC content differed in the beetle genomes and in different genomic regions. CV (coefficient of variability) analysis demonstrated that the repeat copy numbers of P-SSRs presented relatively higher variation in introns than in CDSs and exons. The GO terms of CDSs containing P-SSRs for molecular functions were mainly enriched in "binding" and "transcription". Our findings will be useful for studying the functional roles of microsatellite heterogeneity in beetle adaptation.
Collapse
Affiliation(s)
- Xuhao Song
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong 637009, Sichuan Province, China.
| | - Tingbang Yang
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong 637009, Sichuan Province, China
| | - Xianghui Yan
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong 637009, Sichuan Province, China
| | - Fake Zheng
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong 637009, Sichuan Province, China
| | - Xiaoqin Xu
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong 637009, Sichuan Province, China
| | - Caiquan Zhou
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong 637009, Sichuan Province, China.
| |
Collapse
|
15
|
Thines M, Sharma R, Rodenburg SYA, Gogleva A, Judelson HS, Xia X, van den Hoogen J, Kitner M, Klein J, Neilen M, de Ridder D, Seidl MF, van den Ackerveken G, Govers F, Schornack S, Studholme DJ. The Genome of Peronospora belbahrii Reveals High Heterozygosity, a Low Number of Canonical Effectors, and TC-Rich Promoters. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2020; 33:742-753. [PMID: 32237964 DOI: 10.1094/mpmi-07-19-0211-r] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Along with Plasmopara destructor, Peronosopora belbahrii has arguably been the economically most important newly emerging downy mildew pathogen of the past two decades. Originating from Africa, it has started devastating basil production throughout the world, most likely due to the distribution of infested seed material. Here, we present the genome of this pathogen and results from comparisons of its genomic features to other oomycetes. The assembly of the nuclear genome was around 35.4 Mbp in length, with an N50 scaffold length of around 248 kbp and an L50 scaffold count of 46. The circular mitochondrial genome consisted of around 40.1 kbp. From the repeat-masked genome, 9,049 protein-coding genes were predicted, out of which 335 were predicted to have extracellular functions, representing the smallest secretome so far found in peronosporalean oomycetes. About 16% of the genome consists of repetitive sequences, and, based on simple sequence repeat regions, we provide a set of microsatellites that could be used for population genetic studies of P. belbahrii. P. belbahrii has undergone a high degree of convergent evolution with other obligate parasitic pathogen groups, reflecting its obligate biotrophic lifestyle. Features of its secretome, signaling networks, and promoters are presented, and some patterns are hypothesized to reflect the high degree of host specificity in Peronospora species. In addition, we suggest the presence of additional virulence factors apart from classical effector classes that are promising candidates for future functional studies.
Collapse
Affiliation(s)
- Marco Thines
- Institute of Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Str. 9, 60323 Frankfurt (Main), Germany
- Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, 60325 Frankfurt (Main), Germany
- Integrative Fungal Research (IPF) and Translational Biodiversity Genomics (TBG), Georg-Voigt-Str. 14-16, 60325 Frankfurt (Main), Germany
| | - Rahul Sharma
- Institute of Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Str. 9, 60323 Frankfurt (Main), Germany
- Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, 60325 Frankfurt (Main), Germany
- Integrative Fungal Research (IPF) and Translational Biodiversity Genomics (TBG), Georg-Voigt-Str. 14-16, 60325 Frankfurt (Main), Germany
| | - Sander Y A Rodenburg
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
- Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Anna Gogleva
- University of Cambridge, Sainsbury Laboratory, 47 Bateman Street, Cambridge, CB2 1LR, U.K
| | - Howard S Judelson
- Department of Microbiology and Plant Pathology, University of California, Riverside, CA 92521 U.S.A
| | - Xiaojuan Xia
- Institute of Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Str. 9, 60323 Frankfurt (Main), Germany
- Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, 60325 Frankfurt (Main), Germany
| | - Johan van den Hoogen
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Miloslav Kitner
- Department of Botany, Faculty of Science, Palacký University Olomouc, Šlechtitelů 27, 78371 Olomouc, Czech Republic
| | - Joël Klein
- Plant-Microbe Interactions, Department of Biology, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Manon Neilen
- Plant-Microbe Interactions, Department of Biology, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Michael F Seidl
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Guido van den Ackerveken
- Plant-Microbe Interactions, Department of Biology, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Francine Govers
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Sebastian Schornack
- University of Cambridge, Sainsbury Laboratory, 47 Bateman Street, Cambridge, CB2 1LR, U.K
| | - David J Studholme
- Biosciences, College of Life and Environmental Sciences, University of Exeter, Stocker Road, Exeter EX4 4QD, U.K
| |
Collapse
|
16
|
Duan H, Guo J, Xuan L, Wang Z, Li M, Yin Y, Yang Y. Comparative chloroplast genomics of the genus Taxodium. BMC Genomics 2020; 21:114. [PMID: 32005143 PMCID: PMC6995153 DOI: 10.1186/s12864-020-6532-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 01/23/2020] [Indexed: 12/03/2022] Open
Abstract
Background Chloroplast (cp) genome information would facilitate the development and utilization of Taxodium resources. However, cp genome characteristics of Taxodium were poorly understood. Results We determined the complete cp genome sequences of T. distichum, T. mucronatum, and T. ascendens. The cp genomes are 131,947 bp to 132,613 bp in length, encode 120 genes with the same order, and lack typical inverted repeat (IR) regions. The longest small IR, a 282 bp trnQ-containing IR, were involved in the formation of isomers. Comparative analysis of the 3 cp genomes showed that 91.57% of the indels resulted in the periodic variation of tandem repeat (TR) motifs and 72.46% single nucleotide polymorphisms (SNPs) located closely to TRs, suggesting a relationship between TRs and mutational dynamics. Eleven hypervariable regions were identified as candidates for DNA barcode development. Hypothetical cp open reading frame 1(Ycf1) was the only one gene that has an indel in coding DNA sequence, and the indel is composed of a long TR. When extended to cupressophytes, ycf1 genes have undergone a universal insertion of TRs accompanied by extreme length expansion. Meanwhile, ycf1 also located in rearrangement endpoints of cupressophyte cp genomes. All these characteristics highlight the important role of repeats in the evolution of cp genomes. Conclusions This study added new evidence for the role of repeats in the dynamics mechanism of cp genome mutation and rearrangement. Moreover, the information of TRs and hypervariable regions would provide reliable molecular resources for future research focusing on the infrageneric taxa identification, phylogenetic resolution, population structure and biodiversity for the genus Taxodium and Cupressophytes.
Collapse
Affiliation(s)
- Hao Duan
- Jiangsu Engineering Research Center for Taxodium Rich, Germplasm Innovation and Propagation, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (Nanjing Botanical Garden Mem, Sun Yat-Sen), Nanjing, China
| | - Jinbo Guo
- Jiangsu Engineering Research Center for Taxodium Rich, Germplasm Innovation and Propagation, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (Nanjing Botanical Garden Mem, Sun Yat-Sen), Nanjing, China
| | - Lei Xuan
- Jiangsu Engineering Research Center for Taxodium Rich, Germplasm Innovation and Propagation, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (Nanjing Botanical Garden Mem, Sun Yat-Sen), Nanjing, China
| | - Ziyang Wang
- Jiangsu Engineering Research Center for Taxodium Rich, Germplasm Innovation and Propagation, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (Nanjing Botanical Garden Mem, Sun Yat-Sen), Nanjing, China
| | - Mingzhi Li
- Biodata Biotechnology Co. Ltd, Hefei, China
| | - Yunlong Yin
- Jiangsu Engineering Research Center for Taxodium Rich, Germplasm Innovation and Propagation, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (Nanjing Botanical Garden Mem, Sun Yat-Sen), Nanjing, China
| | - Ying Yang
- Jiangsu Engineering Research Center for Taxodium Rich, Germplasm Innovation and Propagation, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (Nanjing Botanical Garden Mem, Sun Yat-Sen), Nanjing, China.
| |
Collapse
|
17
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
18
|
Harhay GP, Harhay DM, Bono JL, Capik SF, DeDonder KD, Apley MD, Lubbers BV, White BJ, Larson RL, Smith TPL. A Computational Method to Quantify the Effects of Slipped Strand Mispairing on Bacterial Tetranucleotide Repeats. Sci Rep 2019; 9:18087. [PMID: 31792233 PMCID: PMC6889271 DOI: 10.1038/s41598-019-53866-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 11/04/2019] [Indexed: 01/17/2023] Open
Abstract
The virulence and pathogenicity of bacterial pathogens are related to their adaptability to changing environments. One process enabling adaptation is based on minor changes in genome sequence, as small as a few base pairs, within segments of genome called simple sequence repeats (SSRs) that consist of multiple copies of a short sequence (from one to several nucleotides), repeated in series. SSRs are found in eukaryotes as well as prokaryotes, and length variation in them occurs at frequencies up to a million-fold higher than bacterial point mutations through the process of slipped strand mispairing (SSM) by DNA polymerase during replication. The characterization of SSR length by standard sequencing methods is complicated by the appearance of length variation introduced during the sequencing process that obscures the lower abundance repeat number variants in a population. Here we report a computational approach to correct for sequencing process-induced artifacts, validated for tetranucleotide repeats by use of synthetic constructs of fixed, known length. We apply this method to a laboratory culture of Histophilus somni, prepared from a single colony, and demonstrate that the culture consists of populations of distinct sequence phase and length variants at individual tetranucleotide SSR loci.
Collapse
Affiliation(s)
- Gregory P Harhay
- USDA ARS US Meat Animal Research Center, Clay Center, NE, United States.
| | - Dayna M Harhay
- USDA ARS US Meat Animal Research Center, Clay Center, NE, United States
| | - James L Bono
- USDA ARS US Meat Animal Research Center, Clay Center, NE, United States
| | - Sarah F Capik
- Texas A&M AgriLife Research, Amarillo, TX and the College of Veterinary Medicine & Biomedical Sciences, Texas A&M University System, College Station, TX, United States
| | - Keith D DeDonder
- Veterinary and Biomedical Research Center, Inc, Manhattan, KS, United States
| | - Michael D Apley
- Kansas State University, College of Veterinary Medicine, Manhattan, KS, United States
| | - Brian V Lubbers
- Kansas State University, College of Veterinary Medicine, Manhattan, KS, United States
| | - Bradley J White
- Kansas State University, College of Veterinary Medicine, Manhattan, KS, United States
| | - Robert L Larson
- Kansas State University, College of Veterinary Medicine, Manhattan, KS, United States
| | - Timothy P L Smith
- USDA ARS US Meat Animal Research Center, Clay Center, NE, United States
| |
Collapse
|
19
|
Zhang T, Xing Y, Xu L, Bao G, Zhan Z, Yang Y, Wang J, Li S, Zhang D, Kang T. Comparative analysis of the complete chloroplast genome sequences of six species of Pulsatilla Miller, Ranunculaceae. Chin Med 2019; 14:53. [PMID: 31798674 PMCID: PMC6883693 DOI: 10.1186/s13020-019-0274-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/04/2019] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Baitouweng is a traditional Chinese medicine with a long history of different applications. Although referred to as a single medicine, Baitouweng is actually comprised of many closely related species. It is therefore critically important to identify the different species that are utilized in these medicinal applications. Knowledge about their phylogenetic relationships can be derived from their chloroplast genomes and may provide additional insights into development of molecular markers. METHODS Genomic DNA was extracted from six species of Pulsatilla and then sequenced on an Illumina HiSeq 4000. Sequences were assembled into contigs by SOAPdenovo 2.04, aligned to the reference genome using BLAST, and then manually corrected. Genome annotation was performed by the online DOGMA tool. General characteristics of the cp genomes of the six species were analyzed and compared with closely related species. Additionally, phylogenetic trees were constructed, based on single nucleotide polymorphisms (SNPs) and 51 shared protein-coding gene sequences in the cp genome among all 31 species via maximum likelihood. RESULTS The size of cp genomes of P. chinensis (Bge.) Regel, P. chinensis (Bge.) Regel var. kissii (Mandl) S. H. Li et Y. H. Huang, P. cernua (Thunb.) Bercht. et Opiz f. plumbea J. X. Ji et Y. T. zhao, P. dahurica (Fisch.) Spreng, P. turczaninovii Kryl. et Serg, and P. cernua (Thunb.) Bercht. et Opiz. were 163,851 bp, 163,756 bp, 162,481 bp, 162,450 bp, 162,795 bp, and 162,924 bp, respectively. Each species included two inverted repeat regions, a small single-copy region, and a large single-copy region. A total of 134 genes were annotated, including 90 protein-coding genes, 36 tRNAs, and eight rRNAs across all species. In simple sequence repeat analysis, only P. dahurica was found to contain hexanucleotide repeats. A total of 26, 39, 32, 37, 32 and 43 large repeat sequences were identified in the genic regions of the six Pulsatilla species. Nucleotide diversity analysis revealed that the rpl36 gene and ccsA-ndhD region have the highest Pi value. In addition, two phylogenetic trees of the cp genomes were constructed, which laced all Pulsatilla species into one branch within Ranunculaceae. CONCLUSIONS We identified and analyzed the cp genome features of six species of P. Miller, with implications for species identification and phylogenetic analysis.
Collapse
Affiliation(s)
- Tingting Zhang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Yanping Xing
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Liang Xu
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
- Liaoning Quality Monitoring and Technology Service Center for Chinese Materia Medica Raw Materials, Dalian, China
| | - Guihua Bao
- School of Mongol Medicine, Inner Mongolia University for Nationalities, Tongliao, China
| | - Zhilai Zhan
- Traditional Chinese Medicine Resource Center, Chinese Academy of Traditional Chinese Medicine, Beijing, China
| | - Yanyun Yang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Jiahao Wang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Shengnan Li
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Dachuan Zhang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Tingguo Kang
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
- Liaoning Quality Monitoring and Technology Service Center for Chinese Materia Medica Raw Materials, Dalian, China
| |
Collapse
|
20
|
Shi J, Liang C. Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection. PLANT PHYSIOLOGY 2019; 180:1803-1815. [PMID: 31152127 PMCID: PMC6670090 DOI: 10.1104/pp.19.00386] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 05/17/2019] [Indexed: 05/25/2023]
Abstract
Comprehensive and accurate annotation of the repeatome, including transposons, is critical for deepening our understanding of repeat origins, biogenesis, regulatory mechanisms, and roles. Here, we developed Generic Repeat Finder (GRF), a tool for genome-wide repeat detection based on fast, exhaustive numerical calculation algorithms integrated with optimized dynamic programming strategies. GRF sensitively identifies terminal inverted repeats (TIRs), terminal direct repeats (TDRs), and interspersed repeats that bear both inverted and direct repeats. GRF also detects DNA or RNA transposable elements characterized by these repeats in plant and animal genomes. For TIRs and TDRs, GRF identifies spacers in the middle and mismatches/insertions or deletions in terminal repeats, showing their alignment or base-pairing information. GRF helps improve the annotation for various DNA transposons and retrotransposons, such as miniature inverted-repeat transposable elements (MITEs), long terminal repeat (LTR) retrotransposons, and non-LTR retrotransposons, including long interspersed nuclear elements and short interspersed nuclear elements in plants. We used GRF to perform TIR/TDR, interspersed-repeat, and MITE detection in several species, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and mouse (Mus musculus). As a generic bioinformatics tool in repeat finding implemented as a parallelized C++ program, GRF was faster and more sensitive than the existing inverted repeat/MITE detection tools based on numerical approaches (i.e. detectIR and detectMITE) in Arabidopsis and mouse. GRF is more sensitive than Inverted Repeat Finder in TIR detection, LTR_FINDER in short TDR detection (≤1,000 nt), and phRAIDER in interspersed repeat detection in Arabidopsis and rice. GRF is an open source available from Github.
Collapse
Affiliation(s)
- Jieming Shi
- Department of Biology, Miami University, Oxford, Ohio 45056
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio 45056
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio 45056
| |
Collapse
|
21
|
Wei K, Ma L, Zhang T. Characterization of gene promoters in pig: conservative elements, regulatory motifs and evolutionary trend. PeerJ 2019; 7:e7204. [PMID: 31275764 PMCID: PMC6598670 DOI: 10.7717/peerj.7204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 05/29/2019] [Indexed: 02/04/2023] Open
Abstract
It is vital to understand the conservation and evolution of gene promoter sequences in order to understand environmental adaptation. The level of promoter conservation varies greatly between housekeeping (HK) and tissue-specific (TS) genes, denoting differences in the strength of the evolutionary constraints. Here, we analyzed promoter conservation and evolution to exploit differential regulation between HK and TS genes. The analysis of conserved elements showed CpG islands, short tandem repeats and G-quadruplex sequences are highly enriched in HK promoters relative to TS promoters. In addition, the type and density of regulatory motifs in TS promoters are much higher than HK promoters, indicating that TS genes show more complex regulatory patterns than HK genes. Moreover, the evolutionary dynamics of promoters showed similar evolutionary trend to coding sequences. HK promoters suffer more stringent selective pressure in the long-term evolutionary process. HK genes tend to show increased upstream sequence conservation due to stringent selection pressures acting on the promoter regions. The specificity of TS gene expression may be due to complex regulatory motifs acting in different tissues or conditions. The results from this study can be used to deepen our understanding of adaptive evolution.
Collapse
Affiliation(s)
- Kai Wei
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China.,Center of Life and Food Sciences Weihenstephan, Technische Universität München, Freising, Byern, Germany
| | - Lei Ma
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| | - Tingting Zhang
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| |
Collapse
|
22
|
Xu L, Xing Y, Wang B, Liu C, Wang W, Kang T. Plastid genome and composition analysis of two medical ferns: Dryopteris crassirhizoma Nakai and Osmunda japonica Thunb. Chin Med 2019; 14:9. [PMID: 30911328 PMCID: PMC6417082 DOI: 10.1186/s13020-019-0230-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2018] [Accepted: 03/05/2019] [Indexed: 11/23/2022] Open
Abstract
Background Dryopteris crassirhizoma Nakai and Osmunda japonica Thunb. are ferns that are popularly used for medicine, as recorded by the Chinese pharmacopoeia, and are distributed in different regions of China. However, O. japonica is not record in the Standards of Chinese Herbal Medicines in Hong Kong. Research on identification methods of D. crassirhizoma and O. japonica is necessary and the phylogenetic position of the two species should be identified. The plastid genome is structurally highly conserved, providing valuable sources of genetic markers for phylogenetic analyses and development of molecule makers for identification. Methods The plastid genome DNA was extracted from both fern species and then sequenced on the Illumina Hiseq 4000. Sequences were assembled into contigs by SOAPdenovo2.04, aligned to the reference genome using BLAST, and then manually corrected. Genome annotation was performed by the online DOGMA tool. General characteristics of the plastid genomes of the two species were analyzed and compared with closely related species. Additionally, phylogenetical trees were reconstructed by maximum likelihood methods. The content of dryocrassin of the two species were determined according to the Standards of Chinese Herbal Medicines in Hong Kong. Results The genome structures of D. crassirhizoma and O. japonica have different characteristics including the genome size, the size of each area, gene location, and types. Moreover, the (simple sequence repeats) SSRs of the plastid genomes were more similar to other species in the same genera. Compared with D. fragrans, D. crassirhizoma shows an inversion (approximately 1.6 kb), and O. japonica shows two inversions (1.9 kb and 216 bp). The nucleotide diversity (polymorphism information, Pi) analysis showed that the psbK gene and rpl14-rpl16 region have the highest Pi value in Dryopteris, and the ycf2-CDS3 and rpl14-rpl16 regions show the highest Pi vale in O. japonica. Phylogenetic analyses showed that the two species were grouped in two separate clades from each other, with both individually located with other members of their genus. The marker content of dryocrassin is not found in O. japonica. Conclusions The study is the first to identify plastid genome features of D. crassirhizoma and O. japonica. The results may provide a theoretical basis for the identification and the application of the two medically important fern species. Electronic supplementary material The online version of this article (10.1186/s13020-019-0230-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Liang Xu
- 1School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, China.,2School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Yanping Xing
- 2School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Bing Wang
- 2School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| | - Chunsheng Liu
- 1School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, China
| | - Wenquan Wang
- 1School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, China.,3Institute of Medicinal Plant Development, Beijing, China
| | - Tingguo Kang
- 2School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian, China
| |
Collapse
|
23
|
Mustafina FU, Yi D, Choi K, Shin CH, Tojibaev KS, Downie SR. A comparative analysis of complete plastid genomes from Prangos fedtschenkoi and Prangos lipskyi (Apiaceae). Ecol Evol 2019; 9:364-377. [PMID: 30680120 PMCID: PMC6342102 DOI: 10.1002/ece3.4753] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 11/05/2018] [Accepted: 11/06/2018] [Indexed: 11/11/2022] Open
Abstract
Prangos fedtschenkoi (Regel & Schmalh.) Korovin and P. lipskyi Korovin (Apiaceae) are rare plant species endemic to mountainous regions of Middle Asia. Both are edificators of biotic communities and valuable resource plants. The results of recent phylogenetic analyses place them in Prangos subgen. Koelzella (M. Hiroe) Lyskov & Pimenov and suggest they may possibly represent sister species. To aid in development of molecular markers useful for intraspecific phylogeographic and population-level genetic studies of these ecologically and economically important plants, we determined their complete plastid genome sequences and compared the results obtained to several previously published plastomes of Apiaceae. The plastomes of P. fedtschenkoi and P. lipskyi are typical of Apiaceae and most other higher plant plastid DNAs in their sizes (153,626 and 154,143 bp, respectively), structural organization, gene arrangement, and gene content (with 113 unique genes). A total of 49 and 48 short sequence repeat (SSR) loci of 10 bp or longer were detected in P. fedtschenkoi and P. lipskyi plastomes, respectively, representing 42-43 mononucleotides and 6 AT dinucleotides. Seven tandem repeats of 30 bp or longer with a sequence identity ≥90% were identified in each plastome. Further comparisons revealed 319 polymorphic sites between the plastomes (IR, 21; LSC, 234; SSC, 64), representing 43.8% transitions (Ts), 56.1% transversions (Tv), and a Ts/Tv ratio of 0.78. Within genic regions, two indel events were observed in rpoA (6 and 51 bp) and ycf1 (3 and 12 bp), and one in ndhF (6 bp). The most variable intergenic spacer region was that of accD/psaI, with 21.1% nucleotide divergence. Each Prangos species possessed one of two separate inversions (either 5 bp in ndhB intron or 9 bp in petB intron), and these were predicted to form hairpin structures with flanking repeat sequences of 18 and 19 bp, respectively. Both species have also incorporated novel DNA in the LSC region adjacent to the LSC/IRa junction, and BLAST searches revealed it had a 100 bp match (86% sequence identity) to noncoding mitochondrial DNA. Prangos-specific primers were developed for the variable accD/psaI intergenic spacer and preliminary PCR-surveys suggest that this region will be useful for future phylogeographic and population-level studies.
Collapse
Affiliation(s)
- Feruza U. Mustafina
- Division of Forest Biodiversity and HerbariumKorea National ArboretumPocheonRepublic of Korea
- Institute of BotanyUzbek Academy of SciencesTashkentRepublic of Uzbekistan
| | - Dong‐Keun Yi
- Division of Forest Biodiversity and HerbariumKorea National ArboretumPocheonRepublic of Korea
| | - Kyung Choi
- Division of Forest Biodiversity and HerbariumKorea National ArboretumPocheonRepublic of Korea
| | - Chang Ho Shin
- Division of Forest Biodiversity and HerbariumKorea National ArboretumPocheonRepublic of Korea
| | | | - Stephen R. Downie
- Department of Plant BiologyUniversity of Illinois at Urbana‐ChampaignUrbanaIllinois 61801, USA.
| |
Collapse
|
24
|
Qi WH, Jiang XM, Yan CC, Zhang WQ, Xiao GS, Yue BS, Zhou CQ. Distribution patterns and variation analysis of simple sequence repeats in different genomic regions of bovid genomes. Sci Rep 2018; 8:14407. [PMID: 30258087 PMCID: PMC6158176 DOI: 10.1038/s41598-018-32286-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 09/04/2018] [Indexed: 01/23/2023] Open
Abstract
As the first examination of distribution, guanine-cytosine (GC) pattern, and variation analysis of microsatellites (SSRs) in different genomic regions of six bovid species, SSRs displayed nonrandomly distribution in different regions. SSR abundances are much higher in the introns, transposable elements (TEs), and intergenic regions compared to the 3′-untranslated regions (3′UTRs), 5′UTRs and coding regions. Trinucleotide perfect SSRs (P-SSRs) were the most frequent in the coding regions, whereas, mononucleotide P-SSRs were the most in the introns, 3′UTRs, TEs, and intergenic regions. Trifold P-SSRs had more GC-contents in the 5′UTRs and coding regions than that in the introns, 3′UTRs, TEs, and intergenic regions, whereas mononucleotide P-SSRs had the least GC-contents in all genomic regions. The repeat copy numbers (RCN) of the same mono- to hexanucleotide P-SSRs showed significantly different distributions in different regions (P < 0.01). Except for the coding regions, mononucleotide P-SSRs had the most RCNs, followed by the pattern: di- > tri- > tetra- > penta- > hexanucleotide P-SSRs in the same regions. The analysis of coefficient of variability (CV) of SSRs showed that the CV variations of RCN of the same mono- to hexanucleotide SSRs were relative higher in the intronic and intergenic regions, followed by the CV variation of RCN in the TEs, and the relative lower was in the 5′UTRs, 3′UTRs, and coding regions. Wide SSR analysis of different genomic regions has helped to reveal biological significances of their distributions.
Collapse
Affiliation(s)
- Wen-Hua Qi
- College of Biology and Food Engineering, Chongqing Three Gorges University, Chongqing, 404100, P. R. China
| | - Xue-Mei Jiang
- College of Environmental and Chemistry Engineering, Chongqing Three Gorges University, Chongqing, 404100, P. R. China
| | - Chao-Chao Yan
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, P. R. China
| | - Wan-Qing Zhang
- College of Life Sciences, Sichuan Agricultural University, Ya'an, Sichuan Province, 625014, P. R. China
| | - Guo-Sheng Xiao
- College of Biology and Food Engineering, Chongqing Three Gorges University, Chongqing, 404100, P. R. China
| | - Bi-Song Yue
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, P. R. China
| | - Cai-Quan Zhou
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, 637009, P. R. China.
| |
Collapse
|
25
|
Keroack CD, Williams KM, Fessler M, DeAngelis KE, Tsekitsidou E, Tozloski JM, Williams SA. A novel quantitative real-time PCR diagnostic assay for seal heartworm ( Acanthocheilonema spirocauda) provides evidence for possible infection in the grey seal ( Halichoerus grypus). Int J Parasitol Parasites Wildl 2018; 7:147-154. [PMID: 29988808 PMCID: PMC6031957 DOI: 10.1016/j.ijppaw.2018.04.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 03/29/2018] [Accepted: 04/05/2018] [Indexed: 11/15/2022]
Abstract
The distinct evolutionary pressures faced by Pinnipeds have likely resulted in strong coevolutionary ties to their parasites (Leidenberger et al., 2007). This study focuses on the phocid seal filarial heartworm species Acanthocheilonema spirocauda. A. spirocauda is known to infect a variety of phocid seals, but does not appear to be restricted to a single host species (Measures et al., 1997; Leidenberger et al., 2007; Lehnert et al., 2015). However, to date, seal heartworm has never been reported in grey seals (Halichoerus grypus) (Measures et al., 1997; Leidenberger et al., 2007; Lehnert et al., 2015). The proposed vector for seal heartworm is Echinophthirius horridus, the seal louse. Seal lice are known to parasitize a wide array of phocid seal species, including the grey seal. With the advent of climate change, disease burden is expected to increase across terrestrial and marine mammals (Harvell et al., 2002). Accordingly, increased prevalence of seal heartworm has recently been reported in harbor seals (Phoca vitulina) (Lehnert et al., 2015). Thus, the need for improved, rapid, and cost-effective diagnostics is urgent. Here we present the first A. spirocauda-specific rapid diagnostic test (a quantitative real-time PCR assay), based on a highly repetitive genomic DNA repeat identified using whole genome sequencing and subsequent bioinformatic analysis. The presence of an insect vector provides the opportunity to develop a multifunctional diagnostic tool that can be used not only to detect the parasite directly from blood or tissue specimens, but also as a molecular xenomonitoring (XM) tool that can be used to assess the epidemiological profile of the parasite by screening the arthropod vector. Using this assay, we provide evidence for the first reported case of seal heartworm in a grey seal.
Collapse
|
26
|
Hiruta C, Kakui K, Tollefsen KE, Iguchi T. Targeted gene disruption by use of CRISPR/Cas9 ribonucleoprotein complexes in the water fleaDaphnia pulex. Genes Cells 2018; 23:494-502. [DOI: 10.1111/gtc.12589] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2017] [Accepted: 03/30/2018] [Indexed: 12/26/2022]
Affiliation(s)
- Chizue Hiruta
- Faculty of Science; Hokkaido University; Sapporo Japan
| | - Keiichi Kakui
- Faculty of Science; Hokkaido University; Sapporo Japan
| | - Knut E. Tollefsen
- Section of Ecotoxicology and Risk Assessment; Norwegian Institute for Water Research (NIVA); Oslo Norway
| | - Taisen Iguchi
- Graduate School of Nanobioscience; Yokohama City University; Yokohama Japan
- Department of Basic Biology; Faculty of Life Science; Okazaki Institute for Integrative Bioscience; National Institute for Basic Biology; National Institutes of Natural Sciences; SOKENDAI (Graduate University for Advanced Studies); Okazaki Japan
| |
Collapse
|
27
|
Tørresen OK, Brieuc MSO, Solbakken MH, Sørhus E, Nederbragt AJ, Jakobsen KS, Meier S, Edvardsen RB, Jentoft S. Genomic architecture of haddock (Melanogrammus aeglefinus) shows expansions of innate immune genes and short tandem repeats. BMC Genomics 2018; 19:240. [PMID: 29636006 PMCID: PMC5894186 DOI: 10.1186/s12864-018-4616-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 03/22/2018] [Indexed: 02/06/2023] Open
Abstract
Background Increased availability of genome assemblies for non-model organisms has resulted in invaluable biological and genomic insight into numerous vertebrates, including teleosts. Sequencing of the Atlantic cod (Gadus morhua) genome and the genomes of many of its relatives (Gadiformes) demonstrated a shared loss of the major histocompatibility complex (MHC) II genes 100 million years ago. An improved version of the Atlantic cod genome assembly shows an extreme density of tandem repeats compared to other vertebrate genome assemblies. Highly contiguous assemblies are therefore needed to further investigate the unusual immune system of the Gadiformes, and whether the high density of tandem repeats found in Atlantic cod is a shared trait in this group. Results Here, we have sequenced and assembled the genome of haddock (Melanogrammus aeglefinus) – a relative of Atlantic cod – using a combination of PacBio and Illumina reads. Comparative analyses reveal that the haddock genome contains an even higher density of tandem repeats outside and within protein coding sequences than Atlantic cod. Further, both species show an elevated number of tandem repeats in genes mainly involved in signal transduction compared to other teleosts. A characterization of the immune gene repertoire demonstrates a substantial expansion of MCHI in Atlantic cod compared to haddock. In contrast, the Toll-like receptors show a similar pattern of gene losses and expansions. For the NOD-like receptors (NLRs), another gene family associated with the innate immune system, we find a large expansion common to all teleosts, with possible lineage-specific expansions in zebrafish, stickleback and the codfishes. Conclusions The generation of a highly contiguous genome assembly of haddock revealed that the high density of short tandem repeats as well as expanded immune gene families is not unique to Atlantic cod – but possibly a feature common to all, or most, codfishes. A shared expansion of NLR genes in teleosts suggests that the NLRs have a more substantial role in the innate immunity of teleosts than other vertebrates. Moreover, we find that high copy number genes combined with variable genome assembly qualities may impede complete characterization of these genes, i.e. the number of NLRs in different teleost species might be underestimates. Electronic supplementary material The online version of this article (10.1186/s12864-018-4616-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
| | - Marine S O Brieuc
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Monica H Solbakken
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Elin Sørhus
- Institute of Marine Research, Bergen, Norway
| | - Alexander J Nederbragt
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.,Biomedical Informatics Research Group, Department of Informatics, University of Oslo, Oslo, Norway
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | | | | | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
| |
Collapse
|
28
|
Lower SS, McGurk MP, Clark AG, Barbash DA. Satellite DNA evolution: old ideas, new approaches. Curr Opin Genet Dev 2018; 49:70-78. [PMID: 29579574 PMCID: PMC5975084 DOI: 10.1016/j.gde.2018.03.003] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 02/02/2018] [Accepted: 03/08/2018] [Indexed: 12/22/2022]
Abstract
A substantial portion of the genomes of most multicellular eukaryotes consists of large arrays of tandemly repeated sequence, collectively called satellite DNA. The processes generating and maintaining different satellite DNA abundances across lineages are important to understand as satellites have been linked to chromosome mis-segregation, disease phenotypes, and reproductive isolation between species. While much theory has been developed to describe satellite evolution, empirical tests of these models have fallen short because of the challenges in assessing satellite repeat regions of the genome. Advances in computational tools and sequencing technologies now enable identification and quantification of satellite sequences genome-wide. Here, we describe some of these tools and how their applications are furthering our knowledge of satellite evolution and function.
Collapse
Affiliation(s)
- Sarah Sander Lower
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Michael P McGurk
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Daniel A Barbash
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States.
| |
Collapse
|
29
|
Franco ME, Bitencourt TA, Marins M, Fachin AL. In silico characterization of tandem repeats in Trichophyton rubrum and related dermatophytes provides new insights into their role in pathogenesis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:3866792. [PMID: 29220431 PMCID: PMC5502367 DOI: 10.1093/database/bax035] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 03/28/2017] [Indexed: 01/01/2023]
Abstract
Trichophyton rubrum is the most common etiological agent of dermatophytoses worldwide, which is able to degrade keratinized tissues. The sequencing of the genome of different dermatophyte species has provided a large amount of data, including tandem repeats that may play a role in genetic variability and in the pathogenesis of these fungi. Tandem repeats are adjacent DNA sequences of 2–200 nucleotides in length, which exert regulatory and adaptive functions. These repetitive DNA sequences are found in different classes of fungal proteins, especially those involved in cell adhesion, a determinant factor for the establishment of fungal infection. The objective of this study was to develop a Dermatophyte Tandem Repeat Database (DTRDB) for the storage and identification of tandem repeats in T. rubrum and six other dermatophyte species. The current version of the database contains 35 577 tandem repeats detected in 16 173 coding sequences. The repeats can be searched using entry parameters such as repeat unit length (nt—nucleotide), repeat number, variability score, and repeat sequence motif. These data were used to study the relative frequency and distribution of repeats in the sequences, as well as their possible functions in dermatophytes. A search of the database revealed that these repeats occur in 22–33% of genes transcribed in dermatophytes where they could be involved in the success of adaptation to the host tissue and establishment of infection. The repeats were detected in transcripts that are mainly related to three biological processes: regulation, adhesion, and metabolism. The database developed enables users to identify and analyse tandem repeat regions in target genes related to pathogenicity and fungal–host interactions in dermatophytes and may contribute to the discovery of new targets for the development of antifungal agents. Database URL:http://comp.mch.ifsuldeminas.edu.br/dtrdb/
Collapse
Affiliation(s)
- Matheus Eloy Franco
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Federal Institute of Education, Science and Technology of South of Minas Gerais - IFSULDEMINAS, 37750-000, Brazil
| | - Tamires Aparecida Bitencourt
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Departamento de Genetica, 049-900, FMRP-USP, SP, Brazil
| | - Mozart Marins
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Curso de Medicina, Universidade de Ribeirão Preto, SP, Brazil
| | - Ana Lúcia Fachin
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Curso de Medicina, Universidade de Ribeirão Preto, SP, Brazil
| |
Collapse
|
30
|
Kang TH, Han SH, Lee HS. Genetic structure and demographic history of Lymantria dispar (Linnaeus, 1758) (Lepidoptera: Erebidae) in its area of origin and adjacent areas. Ecol Evol 2017; 7:9162-9178. [PMID: 29152205 PMCID: PMC5677484 DOI: 10.1002/ece3.3467] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Revised: 08/04/2017] [Accepted: 09/01/2017] [Indexed: 12/02/2022] Open
Abstract
We analyzed the population genetic structure and demographic history of 20 Lymantria dispar populations from Far East Asia using microsatellite loci and mitochondrial genes. In the microsatellite analysis, the genetic distances based on pairwise FST values ranged from 0.0087 to 0.1171. A NeighborNet network based on pairwise FST genetic distances showed that the 20 regional populations were divided into five groups. Bayesian clustering analysis (K = 3) demonstrated the same groupings. The populations in the Korean Peninsula and adjacent regions, in particular, showed a mixed genetic pattern. In the mitochondrial genetic analysis based on 98 haplotypes, the median‐joining network exhibited a star shape that was focused on three high‐frequency haplotypes (Haplotype 1: central Korea and adjacent regions, Group 1; Haplotype 37: southern Korea, Group 2; and Haplotype 90: Hokkaido area, Group 3) connected by low‐frequency haplotypes. The mismatch distribution dividing the three groups was unimodal. In the neutral test, Tajima's D and Fu's FS tests were negative. We can thus infer that the Far East Asian populations of L. dispar underwent a sudden population expansion. Based on the age expansion parameter, the expansion time was inferred to be approximately 53,652 years before present (ybp) for Group 1, approximately 65,043 ybp for Group 2, and approximately 76,086 ybp for Group 3. We propose that the mixed genetic pattern of the inland populations of Far East Asia is due to these expansions and that the inland populations of the region should be treated as valid subspecies that are distinguishable from other subspecies by genetic traits.
Collapse
Affiliation(s)
- Tae Hwa Kang
- Bio Control Research Center Jeonnam Bioindustry Foundation Gokseong-gun Korea
| | - Sang Hoon Han
- Department of Life Science College of Natural Science Kyonggi University Suwon Korea
| | - Heung Sik Lee
- Plant Quarantine Technology Center Animal and Plant Quarantine Agency Gimcheon-si Korea
| |
Collapse
|
31
|
Distinct patterns of simple sequence repeats and GC distribution in intragenic and intergenic regions of primate genomes. Aging (Albany NY) 2017; 8:2635-2654. [PMID: 27644032 PMCID: PMC5191860 DOI: 10.18632/aging.101025] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 08/22/2016] [Indexed: 01/23/2023]
Abstract
As the first systematic examination of simple sequence repeats (SSRs) and guanine-cytosine (GC) distribution in intragenic and intergenic regions of ten primates, our study showed that SSRs and GC displayed nonrandom distribution for both intragenic and intergenic regions, suggesting that they have potential roles in transcriptional or translational regulation. Our results suggest that the majority of SSRs are distributed in non-coding regions, such as the introns, TEs, and intergenic regions. In these primates, trinucleotide perfect (P) SSRs were the most abundant repeats type in the 5'UTRs and CDSs, whereas, mononucleotide P-SSRs were the most in the intron, 3'UTRs, TEs, and intergenic regions. The GC-contents varied greatly among different intragenic and intergenic regions: 5'UTRs > CDSs > 3'UTRs > TEs > introns > intergenic regions, and high GC-content was frequently distributed in exon-rich regions. Our results also showed that in the same intragenic and intergenic regions, the distribution of GC-contents were great similarity in the different primates. Tri- and hexanucleotide P-SSRs had the most GC-contents in the 5'UTRs and CDSs, whereas mononucleotide P-SSRs had the least GC-contents in the six genomic regions of these primates. The most frequent motifs for different length varied obviously with the different genomic regions.
Collapse
|
32
|
Ding S, Wang S, He K, Jiang M, Li F. Large-scale analysis reveals that the genome features of simple sequence repeats are generally conserved at the family level in insects. BMC Genomics 2017; 18:848. [PMID: 29110701 PMCID: PMC5674736 DOI: 10.1186/s12864-017-4234-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Accepted: 10/23/2017] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Simple sequence repeats (SSR), also called microsatellites, have been widely used as genetic markers, and have been extensively studied in some model insects. At present, the genomes of more than 100 insect species are available. However, the features of SSRs in most insect genomes remain largely unknown. RESULTS We identified 15.01 million SSRs across 136 insect genomes. The number of identified SSRs was positively associated with genome size in insects, but the frequency and density per megabase of genomes were not. Most insect SSRs (56.2-93.1%) were perfect (no mismatch). Imperfect (at least one mismatch) SSRs (average length 22-73 bp) were longer than perfect SSRs (16-30 bp). The most abundant insect SSRs were the di- and trinucleotide types, which accounted for 27.2% and 22.0% of all SSRs, respectively. On average, 59.1%, 36.8%, and 3.7% of insect SSRs were located in intergenic, intronic, and exonic regions, respectively. The percentages of various types of SSRs were similar among insects from the same family. However, they were dissimilar among insects from different families within orders. We carried out a phylogenetic analysis using the SSR frequencies. Species from the same family were generally clustered together in the evolutionary tree. However, insects from the same order but not in the same family did not cluster together. These results indicated that although SSRs undergo rapid expansions and contractions in different populations of the same species, the general genomic features of insect SSRs remain conserved at the family level. CONCLUSION Millions of insect SSRs were identified and their genome features were analyzed. Most insect SSRs were perfect and were located in intergenic regions. We presented evidence that the variance of insect SSRs accumulated after the differentiation of insect families.
Collapse
Affiliation(s)
- Simin Ding
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, 866 Yuhangtang Road, Hangzhou, 310058 China
| | - Shuping Wang
- Technical Centre for Animal Plant and Food Inspection and Quarantine, Shanghai Entry-exit Inspection and Quarantine Bureau, Shanghai, 200135 China
| | - Kang He
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, 866 Yuhangtang Road, Hangzhou, 310058 China
| | - Mingxing Jiang
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, 866 Yuhangtang Road, Hangzhou, 310058 China
| | - Fei Li
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, 866 Yuhangtang Road, Hangzhou, 310058 China
| |
Collapse
|
33
|
Analysis of somatic microsatellite indels identifies driver events in human tumors. Nat Biotechnol 2017; 35:951-959. [DOI: 10.1038/nbt.3966] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 08/18/2017] [Indexed: 01/03/2023]
|
34
|
Annotated Draft Genome Assemblies for the Northern Bobwhite ( Colinus virginianus) and the Scaled Quail ( Callipepla squamata) Reveal Disparate Estimates of Modern Genome Diversity and Historic Effective Population Size. G3-GENES GENOMES GENETICS 2017; 7:3047-3058. [PMID: 28717047 PMCID: PMC5592930 DOI: 10.1534/g3.117.043083] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Northern bobwhite (Colinus virginianus; hereafter bobwhite) and scaled quail (Callipepla squamata) populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0) and second- (v2.0) generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb) was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb), which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%), genome-wide repetitive content (10.40%; 10.43%), and MAKER-predicted protein coding genes (17,131; 17,165) were similar for the scaled quail (v1.0) and bobwhite (v2.0) assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8%) and the bobwhite (v2.0; 82.5%), as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0), and 711 in the bobwhite genome (v2.0), including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0) and bobwhite (v2.0) genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15-20 KYA.
Collapse
|
35
|
Das G, Das S, Dutta S, Ghosh I. In silico identification and characterization of stress and virulence associated repeats in Salmonella. Genomics 2017; 110:23-34. [PMID: 28827093 DOI: 10.1016/j.ygeno.2017.08.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 05/09/2017] [Accepted: 08/03/2017] [Indexed: 01/05/2023]
Abstract
So much genomic similarities yet causing different diseases, is like a paradox in Salmonella biology. Repeat is one of the probes that can explain such differences. Here, a comparative genomics approach is followed to identify and characterize repeats that might play role in adaptation and pathogenesis. Repeats are non-randomly distributed in the genomes except few typhoid causing strains. Perfect long repeats are rare compare to polymorphic ones and both are statistically consistent. Significant differences in repeat densities in stress related genes manifest its probable participation in survival and virulence. 573 and 1053 repeat loci have been identified which are exclusively associated with stress and virulent genes respectively. In Salmonella Typhi, an octameric VNTR locus is found in between acrD and yffB genes having more than 25 perfect copies across Salmonella Typhi but possesses only single copy in other serovars. This repeat can be used as a diagnostic probe for typhoid.
Collapse
Affiliation(s)
- Gourab Das
- School of Computational and Integrative Sciences, Jawaharlal Nehru University (JNU), New Mehrauli Road, Munirka, New Delhi, Delhi 110067, India
| | - Surojit Das
- National Institute of Cholera and Enteric Diseases (NICED), P-33, C.I.T. Road, Scheme XM, Beleghata, Kolkata 700010, India
| | - Shanta Dutta
- National Institute of Cholera and Enteric Diseases (NICED), P-33, C.I.T. Road, Scheme XM, Beleghata, Kolkata 700010, India
| | - Indira Ghosh
- School of Computational and Integrative Sciences, Jawaharlal Nehru University (JNU), New Mehrauli Road, Munirka, New Delhi, Delhi 110067, India.
| |
Collapse
|
36
|
Raman G, Park V, Kwak M, Lee B, Park S. Characterization of the complete chloroplast genome of Arabis stellari and comparisons with related species. PLoS One 2017; 12:e0183197. [PMID: 28809950 PMCID: PMC5557495 DOI: 10.1371/journal.pone.0183197] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 07/31/2017] [Indexed: 01/25/2023] Open
Abstract
Arabis stellari var. japonica is an ornamental plant of the Brassicaceae family, and is widely distributed in South Korea. However, no information is available about its molecular biology and no genomic study has been performed on A. stellari. In this paper, the authors report the complete chloroplast genome sequence of A. stellari. The plastome of A. stellari was 153,683 bp in length with 36.4% GC and included a pair of inverted repeats (IRs) of 26,423 bp that separated a large single-copy (LSC) region of 82,807 bp and a small single-copy (SSC) region of 18,030 bp. It was also found to contain 113 unique genes, of which 79 were protein-coding genes, 30 were transfer RNAs, and four were ribosomal RNAs. The gene content and organization of the A. stellari chloroplast genome were similar to those of other Brassicaceae genomes except for the absence of the rps16 protein-coding gene. A total of 991 SSRs were identified in the genome. The chloroplast genome of A. stellari was compared with closely related species of the Brassicaceae family. Comparative analysis showed a minor divergence occurred in the protein-coding matK, ycf1, ccsA, accD and rpl22 genes and that the KA/KS nucleotide substitution ratio of the ndhA genes of A. stellari and A. hirsuta was 1.35135. The genes infA and rps16 were absent in the Arabis genus and phylogenetic evolutionary studies revealed that these genes evolved independently. However, phylogenetic analysis showed that the positions of Brassicaceae species are highly conserved. The present study provides A. stellari genomic information that may be found useful in conservation and molecular phylogenetic studies on Brassicaceae.
Collapse
Affiliation(s)
- Gurusamy Raman
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsan-buk, Republic of Korea
| | - Veronica Park
- Mcneil high school, Austin, Texas, United States of America
| | - Myounghai Kwak
- Plant Resources Division, National Institute of Biological Resources of Korea, Incheon, Republic of Korea
| | - Byoungyoon Lee
- Plant Resources Division, National Institute of Biological Resources of Korea, Incheon, Republic of Korea
| | - SeonJoo Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsan-buk, Republic of Korea
- * E-mail:
| |
Collapse
|
37
|
Tørresen OK, Star B, Jentoft S, Reinar WB, Grove H, Miller JR, Walenz BP, Knight J, Ekholm JM, Peluso P, Edvardsen RB, Tooming-Klunderud A, Skage M, Lien S, Jakobsen KS, Nederbragt AJ. An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics 2017; 18:95. [PMID: 28100185 PMCID: PMC5241972 DOI: 10.1186/s12864-016-3448-x] [Citation(s) in RCA: 115] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 12/20/2016] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. RESULTS By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. CONCLUSIONS The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.
Collapse
Affiliation(s)
- Ole K. Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
- Department of Natural Sciences, University of Agder, Kristiansand, NO-4604 Norway
| | - William B. Reinar
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Harald Grove
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, NO-1432 Norway
| | - Jason R. Miller
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, 20850 MD USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, 20892 MD USA
| | - James Knight
- Yale School of Medicine, Yale University, New Haven, 06520 CT USA
| | | | | | | | - Ave Tooming-Klunderud
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Morten Skage
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, NO-1432 Norway
| | - Kjetill S. Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Alexander J. Nederbragt
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
- Biomedical Informatics Research Group, Department of Informatics, University of Oslo, Oslo, NO-0316 Norway
| |
Collapse
|
38
|
Phylogenetic Relationships of the Fern Cyrtomium falcatum (Dryopteridaceae) from Dokdo Island Based on Chloroplast Genome Sequencing. Genes (Basel) 2016; 7:genes7120115. [PMID: 28009803 PMCID: PMC5192491 DOI: 10.3390/genes7120115] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 11/18/2016] [Accepted: 11/28/2016] [Indexed: 11/17/2022] Open
Abstract
Cyrtomium falcatum is a popular ornamental fern cultivated worldwide. Native to the Korean Peninsula, Japan, and Dokdo Island in the Sea of Japan, it is the only fern present on Dokdo Island. We isolated and characterized the chloroplast (cp) genome of C. falcatum, and compared it with those of closely related species. The genes trnV-GAC and trnV-GAU were found to be present within the cp genome of C. falcatum, whereas trnP-GGG and rpl21 were lacking. Moreover, cp genomes of Cyrtomium devexiscapulae and Adiantum capillus-veneris lack trnP-GGG and rpl21, suggesting these are not conserved among angiosperm cp genomes. The deletion of trnR-UCG, trnR-CCG, and trnSeC in the cp genomes of C. falcatum and other eupolypod ferns indicates these genes are restricted to tree ferns, non-core leptosporangiates, and basal ferns. The C. falcatum cp genome also encoded ndhF and rps7, with GUG start codons that were only conserved in polypod ferns, and it shares two significant inversions with other ferns, including a minor inversion of the trnD-GUC region and an approximate 3 kb inversion of the trnG-trnT region. Phylogenetic analyses showed that Equisetum was found to be a sister clade to Psilotales-Ophioglossales with a 100% bootstrap (BS) value. The sister relationship between Pteridaceae and eupolypods was also strongly supported by a 100% BS, but Bayesian molecular clock analyses suggested that C. falcatum diversified in the mid-Paleogene period (45.15 ± 4.93 million years ago) and might have moved from Eurasia to Dokdo Island.
Collapse
|
39
|
Wang C, Kubiak LJ, Du LM, Li WJ, Jian ZY, Tang C, Fan ZX, Zhang XY, Yue BS. Comparison of microsatellite distribution in genomes of Centruroides exilicauda and Mesobuthus martensii. Gene 2016; 594:41-46. [DOI: 10.1016/j.gene.2016.08.047] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 08/13/2016] [Accepted: 08/28/2016] [Indexed: 10/21/2022]
|
40
|
Development of 12 Microsatellite Markers in Dorcus titanus castanicolor (Motschulsky, 1861) (Lucanidae, Coleoptera) from Korea Using Next-Generation Sequencing. Int J Mol Sci 2016; 17:ijms17101621. [PMID: 27669231 PMCID: PMC5085654 DOI: 10.3390/ijms17101621] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Revised: 09/10/2016] [Accepted: 09/15/2016] [Indexed: 11/17/2022] Open
Abstract
In the present study, we used next-generation sequencing to develop 12 novel microsatellite markers for genetic structural analysis of Dorcus titanus castanicolor (Lucanidae; Coleoptera), a popular pet insect in China, Korea, and Japan. We identified 52,357 microsatellite loci in 339,287,381 bp of genomic sequence and selected 19 of the loci based on their PCR amplification efficiency and polymorphism. The 19 selected markers were then tested for the presence of null alleles and linkage disequilibrium. We did not detect any evidence of null alleles; however, four pairs of loci (DT03 and DT11, DT05 and DT26, DT08 and DT26, DT26 and DT35) exhibited linkage disequilibrium. Thus, we assessed the genetic diversity of a D. titanus castanicolor population from the Daejeon region of Korea (n = 22) using 13 markers. Among them, one marker (DT17) deviated from Hardy-Weinberg equilibrium. Therefore, 12 markers may be useful for further analyzing the genetic diversity of D. titanus castanicolor.
Collapse
|
41
|
Muñoz J, Chaturvedi A, De Meester L, Weider LJ. Characterization of genome-wide SNPs for the water flea Daphnia pulicaria generated by genotyping-by-sequencing (GBS). Sci Rep 2016; 6:28569. [PMID: 27346179 PMCID: PMC4921830 DOI: 10.1038/srep28569] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 06/01/2016] [Indexed: 12/01/2022] Open
Abstract
The keystone aquatic herbivore Daphnia has been studied for more than 150 years in the context of evolution, ecology and ecotoxicology. Although it is rapidly becoming an emergent model for environmental and population genomics, there have been limited genome-wide level studies in natural populations. We report a unique resource of novel Single Nucleotide Polymorphic (SNP) markers for Daphnia pulicaria using the reduction in genomic complexity with the restriction enzymes approach, genotyping-by-sequencing. Using the genome of D. pulex as a reference, SNPs were scored for 53 clones from five natural populations that varied in lake trophic status. Our analyses resulted in 32,313 highly confident and bi-allelic SNP markers. 1,364 outlier SNPs were mapped on the annotated D. pulex genome, which identified 2,335 genes, including 565 within functional genes. Out of 885 EuKaryotic Orthologous Groups that we found from outlier SNPs, 294 were involved in three metabolic and four regulatory pathways. Bayesian-clustering analyses showed two distinct population clusters representing the possible combined effects of geography and lake trophic status. Our results provide an invaluable tool for future population genomics surveys in Daphnia targeting informative regions related to physiological processes that can be linked to the ecology of this emerging eco-responsive taxon.
Collapse
Affiliation(s)
- Joaquín Muñoz
- Doñana Biological Station (CSIC), Isla de La Cartuja, Av. Américo Vespucio S/N, 41092-Seville, Spain.,Department of Biology, Program in Ecology and Evolutionary Biology, The University of Oklahoma, 730 Van Vleet Oval, Norman, OK 73019, USA
| | - Anurag Chaturvedi
- Laboratory of Aquatic Ecology, Evolution and Conservation, University of Leuven, Ch. Deberiotstraat 32, Leuven 3000, Belgium
| | - Luc De Meester
- Laboratory of Aquatic Ecology, Evolution and Conservation, University of Leuven, Ch. Deberiotstraat 32, Leuven 3000, Belgium
| | - Lawrence J Weider
- Department of Biology, Program in Ecology and Evolutionary Biology, The University of Oklahoma, 730 Van Vleet Oval, Norman, OK 73019, USA
| |
Collapse
|
42
|
Bueker B, Eberlein C, Gladieux P, Schaefer A, Snirc A, Bennett DJ, Begerow D, Hood ME, Giraud T. Distribution and population structure of the anther smut Microbotryum silenes-acaulis parasitizing an arctic-alpine plant. Mol Ecol 2016; 25:811-24. [PMID: 26671732 DOI: 10.1111/mec.13512] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 11/02/2015] [Accepted: 11/26/2015] [Indexed: 12/18/2022]
Abstract
Cold-adapted organisms with current arctic-alpine distributions have persisted during the last glaciation in multiple ice-free refugia, leaving footprints in their population structure that contrast with temperate plants and animals. However, pathogens that live within hosts having arctic-alpine distributions have been little studied. Here, we therefore investigated the geographical range and population structure of a fungus parasitizing an arctic-alpine plant. A total of 1437 herbarium specimens of the plant Silene acaulis were examined, and the anther smut pathogen Microbotryum silenes-acaulis was present throughout the host's geographical range. There was significantly greater incidence of anther smut disease in more northern latitudes and where the host locations were less dense, indicating a major influence of environmental factors and/or host demographic structure on the pathogen distribution. Genetic analyses with seven microsatellite markers on recent collections of 195 M. silenes-acaulis individuals revealed three main genetic clusters, in North America, northern Europe and southern Europe, likely corresponding to differentiation in distinct refugia during the last glaciation. The lower genetic diversity in northern Europe indicates postglacial recolonization northwards from southern refugia. This study combining herbarium surveys and population genetics thus uniquely reveals the effects of climate and environmental factors on a plant pathogen species with an arctic-alpine distribution.
Collapse
Affiliation(s)
- Britta Bueker
- Lehrstuhl für Evolution und Biodiversität der Pflanzen, AG Geobotanik, Ruhr-Universität Bochum, Universitätsstraße 150, 44780, Bochum, Germany.,Department of Biology, Amherst College, 220 South Pleasant Street, Amherst, MA, 01002, USA
| | - Chris Eberlein
- Lehrstuhl für Evolution und Biodiversität der Pflanzen, AG Geobotanik, Ruhr-Universität Bochum, Universitätsstraße 150, 44780, Bochum, Germany.,Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Université Laval, Pavillon Charles-Eugène-Marchand, 1030 Avenue de la Médicine, Quebec City, Quebec, Canada, G1V 0A6
| | - Pierre Gladieux
- Ecologie Systématique Evolution, CNRS, Univ. Paris-Sud, AgroParisTech, Université Paris-Saclay, 91400, Orsay, France.,INRA, UMR BGPI, Bâtiment K, Campus International de Baillarguet, F-34398, Montpellier, France.,CIRAD, F-34398, Montpellier, France
| | - Angela Schaefer
- Lehrstuhl für Evolution und Biodiversität der Pflanzen, AG Geobotanik, Ruhr-Universität Bochum, Universitätsstraße 150, 44780, Bochum, Germany
| | - Alodie Snirc
- Ecologie Systématique Evolution, CNRS, Univ. Paris-Sud, AgroParisTech, Université Paris-Saclay, 91400, Orsay, France
| | - Dominic J Bennett
- Ecologie Systématique Evolution, CNRS, Univ. Paris-Sud, AgroParisTech, Université Paris-Saclay, 91400, Orsay, France.,Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Dominik Begerow
- Lehrstuhl für Evolution und Biodiversität der Pflanzen, AG Geobotanik, Ruhr-Universität Bochum, Universitätsstraße 150, 44780, Bochum, Germany
| | - Michael E Hood
- Department of Biology, Amherst College, 220 South Pleasant Street, Amherst, MA, 01002, USA
| | - Tatiana Giraud
- Ecologie Systématique Evolution, CNRS, Univ. Paris-Sud, AgroParisTech, Université Paris-Saclay, 91400, Orsay, France
| |
Collapse
|
43
|
Raman G, Park S. The Complete Chloroplast Genome Sequence of Ampelopsis: Gene Organization, Comparative Analysis, and Phylogenetic Relationships to Other Angiosperms. FRONTIERS IN PLANT SCIENCE 2016; 7:341. [PMID: 27047519 PMCID: PMC4800181 DOI: 10.3389/fpls.2016.00341] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Accepted: 03/06/2016] [Indexed: 05/20/2023]
Abstract
Ampelopsis brevipedunculata is an economically important plant that belongs to the Vitaceae family of angiosperms. The phylogenetic placement of Vitaceae is still unresolved. Recent phylogenetic studies suggested that it should be placed in various alternative families including Caryophyllaceae, asteraceae, Saxifragaceae, Dilleniaceae, or with the rest of the rosid families. However, these analyses provided weak supportive results because they were based on only one of several genes. Accordingly, complete chloroplast genome sequences are required to resolve the phylogenetic relationships among angiosperms. Recent phylogenetic analyses based on the complete chloroplast genome sequence suggested strong support for the position of Vitaceae as the earliest diverging lineage of rosids and placed it as a sister to the remaining rosids. These studies also revealed relationships among several major lineages of angiosperms; however, they highlighted the significance of taxon sampling for obtaining accurate phylogenies. In the present study, we sequenced the complete chloroplast genome of A. brevipedunculata and used these data to assess the relationships among 32 angiosperms, including 18 taxa of rosids. The Ampelopsis chloroplast genome is 161,090 bp in length, and includes a pair of inverted repeats of 26,394 bp that are separated by small and large single copy regions of 19,036 bp and 89,266 bp, respectively. The gene content and order of Ampelopsis is identical to many other unrearranged angiosperm chloroplast genomes, including Vitis and tobacco. A phylogenetic tree constructed based on 70 protein-coding genes of 33 angiosperms showed that both Saxifragales and Vitaceae diverged from the rosid clade and formed two clades with 100% bootstrap value. The position of the Vitaceae is sister to Saxifragales, and both are the basal and earliest diverging lineages. Moreover, Saxifragales forms a sister clade to Vitaceae of rosids. Overall, the results of this study will contribute to better support of the evolution, molecular biology and genetic improvement of the plant Ampelopsis.
Collapse
|
44
|
Qin Z, Wang Y, Wang Q, Li A, Hou F, Zhang L. Evolution Analysis of Simple Sequence Repeats in Plant Genome. PLoS One 2015; 10:e0144108. [PMID: 26630570 PMCID: PMC4668000 DOI: 10.1371/journal.pone.0144108] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 11/13/2015] [Indexed: 01/30/2023] Open
Abstract
Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1–3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.
Collapse
Affiliation(s)
- Zhen Qin
- Crop Research Institute, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Yanping Wang
- Shandong Key Laboratory of Animal Disease Control and Breeding/Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Qingmei Wang
- Crop Research Institute, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Aixian Li
- Crop Research Institute, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Fuyun Hou
- Crop Research Institute, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Liming Zhang
- Crop Research Institute, Shandong Academy of Agricultural Sciences, Jinan, China
- * E-mail:
| |
Collapse
|
45
|
Raman G, Park S. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective. PLoS One 2015; 10:e0141329. [PMID: 26513163 PMCID: PMC4626046 DOI: 10.1371/journal.pone.0141329] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 10/06/2015] [Indexed: 11/18/2022] Open
Abstract
Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.
Collapse
Affiliation(s)
- Gurusamy Raman
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsan-buk, Republic of Korea
| | - SeonJoo Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsan-buk, Republic of Korea
- * E-mail:
| |
Collapse
|
46
|
Fertin G, Jean G, Radulescu A, Rusu I. Hybrid de novo tandem repeat detection using short and long reads. BMC Med Genomics 2015; 8 Suppl 3:S5. [PMID: 26399998 PMCID: PMC4582210 DOI: 10.1186/1755-8794-8-s3-s5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Background As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. Methods In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies. Results MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns. Conclusions Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.
Collapse
|
47
|
Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution. PLoS One 2015; 10:e0128126. [PMID: 26061691 PMCID: PMC4464890 DOI: 10.1371/journal.pone.0128126] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 04/23/2015] [Indexed: 11/19/2022] Open
Abstract
The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine.
Collapse
|
48
|
Abe H, Gemmell NJ. Abundance, arrangement, and function of sequence motifs in the chicken promoters. BMC Genomics 2014; 15:900. [PMID: 25318583 PMCID: PMC4203960 DOI: 10.1186/1471-2164-15-900] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 10/08/2014] [Indexed: 01/01/2023] Open
Abstract
Background Eukaryotic promoters are regions containing various sequence motifs necessary to control gene transcription. Much evidence has emerged showing that structural and/or contextual changes in regulatory elements can critically affect cis-regulatory activity. As sequence motifs can be key factors in maintaining complex promoter architectures, one effective approach to further understand the evolution of promoter regions in vertebrates is to compare the abundance and distribution patterns of sequence motifs in these regions between divergent species. When compared with mammals, the chicken (Gallus gallus) has a very different genome composition and sufficient genomic information to make it a good model for the exploration of promoter structure and evolution. Results More than 10% of chicken genes contained short tandem repeat (STR) in the region 2 kb upstream of promoters, but the total number of STRs observed in chicken is approximately half of that detected in human promoters. In terms of the STR motif frequencies, chicken promoter regions were more similar to other avian and mammalian promoters than these were to the entire chicken genome. Unlike other STRs, nearly half of the trinucleotide repeats found in promoters partly or entirely overlapped with CpG islands, indicating potential association with nucleosome positions. Moreover, the chicken promoters are abundant with sequence motifs such as poly-A, poly-G and G-quadruplexes, especially in the core region, that are otherwise rare in the genome. Most of sequence motifs showed strong functional enrichment for particular gene ontology (GO) categories, indicating roles in regulation of transcription and gene expression, as well as immune response and cognition. Conclusions Chicken promoter regions share some, but not all, of the structural features observed in mammalian promoters. The findings presented here provide empirical evidence suggesting that the frequencies and locations of STR motifs have been conserved through promoter evolution in a lineage-specific manner. Correlation analysis between GO categories and sequence motifs suggests motif-specific constraints acting on gene function. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-900) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hideaki Abe
- Department of Anatomy, University of Otago, Dunedin, New Zealand.
| | | |
Collapse
|
49
|
Jiang Q, Li Q, Yu H, Kong L. Genome-wide analysis of simple sequence repeats in marine animals-a comparative approach. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2014; 16:604-619. [PMID: 24939717 DOI: 10.1007/s10126-014-9580-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2014] [Accepted: 05/22/2014] [Indexed: 06/03/2023]
Abstract
Tandem simple sequence repeats (SSRs) are one of the most popular molecular markers in genetic analysis owing to their ubiquitous occurrence,high reproducibility, multiallelic nature, and codominant mode. High mutability makes SSRs play a role in genome evolution and correspondingly show different patterns. Comparative analysis of genomic SSRs in different taxonomic groups usually focuses on land species, while marine animals have been neglected. This study examined the abundance of genomic SSRs with repeated unit lengths of 1-6 bp in 30 marine animals including nine taxonomic groups and further compared with the land species. More than thousands of SSRs were discovered in every organism which provided a huge resource for the development of molecular markers. Thirty marine animals showed profound differences in SSR characteristics, but some group-specific trends were also found. Both similarities and differences of repeat patterns were discovered between the land and marine species. Two taxon-specific SSR types were discovered: the pentanucleotides motif AGAGG in Euteleostei and the hexanucleotide repeats of ATGTAC in Porifera and Echinodermata. Gene ontology (GO) enrichment analysis of two representative species (Amphimedon queenslandica for Porifera and Strongylocentrotus purpuratus for Echinodermata) revealed functional preference of the ATGTAC motif associated genes, and this might hint at evolutionary significance.
Collapse
Affiliation(s)
- Qun Jiang
- The Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, 266003, Qingdao, China
| | | | | | | |
Collapse
|
50
|
Smith DR. Buying in to bioinformatics: an introduction to commercial sequence analysis software. Brief Bioinform 2014; 16:700-9. [PMID: 25183247 PMCID: PMC4501248 DOI: 10.1093/bib/bbu030] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2014] [Accepted: 08/07/2014] [Indexed: 11/25/2022] Open
Abstract
Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics.
Collapse
|