51
|
Rico C, Cuesta JA, Drake P, Macpherson E, Bernatchez L, Marie AD. Null alleles are ubiquitous at microsatellite loci in the Wedge Clam ( Donax trunculus). PeerJ 2017; 5:e3188. [PMID: 28439464 PMCID: PMC5398275 DOI: 10.7717/peerj.3188] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 03/15/2017] [Indexed: 12/17/2022] Open
Abstract
Recent studies have reported an unusually high frequency of nonamplifying alleles at microsatellite loci in bivalves. Null alleles have been associated with heterozygous deficits in many studies. While several studies have tested for its presence using different analytical tools, few have empirically tested for its consequences in estimating population structure and differentiation. We characterised 16 newly developed microsatellite loci and show that null alleles are ubiquitous in the wedge clam, Donax trunculus. We carried out several tests to demonstrate that the large heterozygous deficits observed in the newly characterised loci were most likely due to null alleles. We tested the robustness of microsatellite genotyping for population assignment by showing that well-recognised biogeographic regions of the south Atlantic and south Mediterranean coast of Spain harbour genetically different populations.
Collapse
Affiliation(s)
- Ciro Rico
- School of Marine Studies, Molecular Analytics Laboratory (MOANA), Faculty of Science Technology and Environment, The University of the South Pacific, Suva, Fiji.,Estación Biológica de Doñana, (EBD, CSIC), Sevilla, Spain
| | - Jose Antonio Cuesta
- Instituto de Ciencias Marinas de Andalucía (ICMAN, CSIC), Puerto Real (Cádiz), Spain
| | - Pilar Drake
- Instituto de Ciencias Marinas de Andalucía (ICMAN, CSIC), Puerto Real (Cádiz), Spain
| | | | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie, Pavillon Charles-Eugène-Marchand, Laval University, Quebec, Canada
| | - Amandine D Marie
- School of Marine Studies, Molecular Analytics Laboratory (MOANA), Faculty of Science Technology and Environment, The University of the South Pacific, Suva, Fiji
| |
Collapse
|
52
|
Database of Periodic DNA Regions in Major Genomes. BIOMED RESEARCH INTERNATIONAL 2017; 2017:7949287. [PMID: 28182099 PMCID: PMC5274682 DOI: 10.1155/2017/7949287] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 12/07/2016] [Accepted: 12/21/2016] [Indexed: 12/11/2022]
Abstract
Summary. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming. Insertions and deletions were allowed inside periodicities, thus adding a novelty to the results we obtained. A periodicity length, one of the key periodicity features, varied from 2 to 50 nt. Totally over 60,000 periodicity sequences were found in 15 genomes including some chromosomes of the H. sapiens (partial), C. elegans, D. melanogaster, and A. thaliana genomes.
Collapse
|
53
|
Shimizu T, Kitajima A, Nonaka K, Yoshioka T, Ohta S, Goto S, Toyoda A, Fujiyama A, Mochizuki T, Nagasaki H, Kaminuma E, Nakamura Y. Hybrid Origins of Citrus Varieties Inferred from DNA Marker Analysis of Nuclear and Organelle Genomes. PLoS One 2016; 11:e0166969. [PMID: 27902727 PMCID: PMC5130255 DOI: 10.1371/journal.pone.0166969] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2016] [Accepted: 11/07/2016] [Indexed: 01/07/2023] Open
Abstract
Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy-Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies.
Collapse
Affiliation(s)
- Tokurou Shimizu
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, NARO, Shimizu, Shizuoka, Japan
- * E-mail:
| | - Akira Kitajima
- Experimental Farm, Graduate School of Agriculture, Kyoto University, Kizugawa, Kyoto, Japan
| | - Keisuke Nonaka
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, NARO, Shimizu, Shizuoka, Japan
| | - Terutaka Yoshioka
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, NARO, Shimizu, Shizuoka, Japan
| | - Satoshi Ohta
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, NARO, Shimizu, Shizuoka, Japan
| | - Shingo Goto
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, NARO, Shimizu, Shizuoka, Japan
| | - Atsushi Toyoda
- National Institute of Genetics, Comparative Genomics laboratory, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Asao Fujiyama
- National Institute of Genetics, Comparative Genomics laboratory, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Takako Mochizuki
- National Institute of Genetics, Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Hideki Nagasaki
- National Institute of Genetics, Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Eli Kaminuma
- National Institute of Genetics, Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Yasukazu Nakamura
- National Institute of Genetics, Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, Japan
| |
Collapse
|
54
|
Olsson S, Seoane-Zonjic P, Bautista R, Claros MG, González-Martínez SC, Scotti I, Scotti-Saintagne C, Hardy OJ, Heuertz M. Development of genomic tools in a widespread tropical tree, Symphonia globulifera L.f.: a new low-coverage draft genome, SNP and SSR markers. Mol Ecol Resour 2016; 17:614-630. [PMID: 27718316 DOI: 10.1111/1755-0998.12605] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 09/30/2016] [Accepted: 10/04/2016] [Indexed: 01/08/2023]
Abstract
Population genetic studies in tropical plants are often challenging because of limited information on taxonomy, phylogenetic relationships and distribution ranges, scarce genomic information and logistic challenges in sampling. We describe a strategy to develop robust and widely applicable genetic markers based on a modest development of genomic resources in the ancient tropical tree species Symphonia globulifera L.f. (Clusiaceae), a keystone species in African and Neotropical rainforests. We provide the first low-coverage (11X) fragmented draft genome sequenced on an individual from Cameroon, covering 1.027 Gbp or 67.5% of the estimated genome size. Annotation of 565 scaffolds (7.57 Mbp) resulted in the prediction of 1046 putative genes (231 of them containing a complete open reading frame) and 1523 exact simple sequence repeats (SSRs, microsatellites). Aligning a published transcriptome of a French Guiana population against this draft genome produced 923 high-quality single nucleotide polymorphisms. We also preselected genic SSRs in silico that were conserved and polymorphic across a wide geographical range, thus reducing marker development tests on rare DNA samples. Of 23 SSRs tested, 19 amplified and 18 were successfully genotyped in four S. globulifera populations from South America (Brazil and French Guiana) and Africa (Cameroon and São Tomé island, FST = 0.34). Most loci showed only population-specific deviations from Hardy-Weinberg proportions, pointing to local population effects (e.g. null alleles). The described genomic resources are valuable for evolutionary studies in Symphonia and for comparative studies in plants. The methods are especially interesting for widespread tropical or endangered taxa with limited DNA availability.
Collapse
Affiliation(s)
- Sanna Olsson
- Department of Forest Ecology and Genetics, INIA Forest Research Centre (INIA-CIFOR), Carretera de A Coruña km 7.5, E-28040, Madrid, Spain
| | - Pedro Seoane-Zonjic
- Departamento de Biología Molecular y Bioquímica, and Plataforma Andaluza de Bioinformática, Universidad de Málaga, calle Severo Ochoa 34, E-29590, Campanillas, Málaga, Spain
| | - Rocío Bautista
- Departamento de Biología Molecular y Bioquímica, and Plataforma Andaluza de Bioinformática, Universidad de Málaga, calle Severo Ochoa 34, E-29590, Campanillas, Málaga, Spain
| | - M Gonzalo Claros
- Departamento de Biología Molecular y Bioquímica, and Plataforma Andaluza de Bioinformática, Universidad de Málaga, calle Severo Ochoa 34, E-29590, Campanillas, Málaga, Spain
| | - Santiago C González-Martínez
- Department of Forest Ecology and Genetics, INIA Forest Research Centre (INIA-CIFOR), Carretera de A Coruña km 7.5, E-28040, Madrid, Spain.,UMR1202 BioGeCo, INRA, Univ. Bordeaux, 69 route d'Arcachon, F-33610, Cestas, France
| | - Ivan Scotti
- INRA, UR629 URFM, Ecologie des Forêts Méditerranéennes, Site Agroparc, Domaine Saint Paul, F-84914, Avignon Cedex 9, France
| | - Caroline Scotti-Saintagne
- INRA, UR629 URFM, Ecologie des Forêts Méditerranéennes, Site Agroparc, Domaine Saint Paul, F-84914, Avignon Cedex 9, France
| | - Olivier J Hardy
- Faculté des Sciences, Evolutionary Biology and Ecology, Université Libre de Bruxelles, Av. F.D. Roosevelt 50, CP 160/12, B-1050, Brussels, Belgium
| | - Myriam Heuertz
- Department of Forest Ecology and Genetics, INIA Forest Research Centre (INIA-CIFOR), Carretera de A Coruña km 7.5, E-28040, Madrid, Spain.,UMR1202 BioGeCo, INRA, Univ. Bordeaux, 69 route d'Arcachon, F-33610, Cestas, France.,Faculté des Sciences, Evolutionary Biology and Ecology, Université Libre de Bruxelles, Av. F.D. Roosevelt 50, CP 160/12, B-1050, Brussels, Belgium
| |
Collapse
|
55
|
Osbak KK, Houston S, Lithgow KV, Meehan CJ, Strouhal M, Šmajs D, Cameron CE, Van Ostade X, Kenyon CR, Van Raemdonck GA. Characterizing the Syphilis-Causing Treponema pallidum ssp. pallidum Proteome Using Complementary Mass Spectrometry. PLoS Negl Trop Dis 2016; 10:e0004988. [PMID: 27606673 PMCID: PMC5015957 DOI: 10.1371/journal.pntd.0004988] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 08/19/2016] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND The spirochete bacterium Treponema pallidum ssp. pallidum is the etiological agent of syphilis, a chronic multistage disease. Little is known about the global T. pallidum proteome, therefore mass spectrometry studies are needed to bring insights into pathogenicity and protein expression profiles during infection. METHODOLOGY/PRINCIPAL FINDINGS To better understand the T. pallidum proteome profile during infection, we studied T. pallidum ssp. pallidum DAL-1 strain bacteria isolated from rabbits using complementary mass spectrometry techniques, including multidimensional peptide separation and protein identification via matrix-assisted laser desorption ionization-time of flight (MALDI-TOF/TOF) and electrospray ionization (ESI-LTQ-Orbitrap) tandem mass spectrometry. A total of 6033 peptides were detected, corresponding to 557 unique T. pallidum proteins at a high level of confidence, representing 54% of the predicted proteome. A previous gel-based T. pallidum MS proteome study detected 58 of these proteins. One hundred fourteen of the detected proteins were previously annotated as hypothetical or uncharacterized proteins; this is the first account of 106 of these proteins at the protein level. Detected proteins were characterized according to their predicted biological function and localization; half were allocated into a wide range of functional categories. Proteins annotated as potential membrane proteins and proteins with unclear functional annotations were subjected to an additional bioinformatics pipeline analysis to facilitate further characterization. A total of 116 potential membrane proteins were identified, of which 16 have evidence supporting outer membrane localization. We found 8/12 proteins related to the paralogous tpr gene family: TprB, TprC/D, TprE, TprG, TprH, TprI and TprJ. Protein abundance was semi-quantified using label-free spectral counting methods. A low correlation (r = 0.26) was found between previous microarray signal data and protein abundance. CONCLUSIONS This is the most comprehensive description of the global T. pallidum proteome to date. These data provide valuable insights into in vivo T. pallidum protein expression, paving the way for improved understanding of the pathogenicity of this enigmatic organism.
Collapse
Affiliation(s)
- Kara K Osbak
- HIV/STI Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Simon Houston
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
| | - Karen V Lithgow
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
| | - Conor J Meehan
- Unit of Mycobacteriology, Institute of Tropical Medicine, Antwerp, Belgium
| | - Michal Strouhal
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - David Šmajs
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Caroline E Cameron
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
| | - Xaveer Van Ostade
- Laboratory for Protein Science, Proteomics and Epigenetic Signaling (PPES) and Centre for Proteomics (CFP), University of Antwerp, Wilrijk, Belgium
| | - Chris R Kenyon
- HIV/STI Unit, Institute of Tropical Medicine, Antwerp, Belgium.,Division of Infectious Diseases and HIV Medicine, University of Cape Town, Cape Town, South Africa
| | - Geert A Van Raemdonck
- HIV/STI Unit, Institute of Tropical Medicine, Antwerp, Belgium.,Laboratory for Protein Science, Proteomics and Epigenetic Signaling (PPES) and Centre for Proteomics (CFP), University of Antwerp, Wilrijk, Belgium
| |
Collapse
|
56
|
Next-generation sequencing of FLT3 internal tandem duplications for minimal residual disease monitoring in acute myeloid leukemia. Oncotarget 2016; 6:22812-21. [PMID: 26078355 PMCID: PMC4673201 DOI: 10.18632/oncotarget.4333] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2014] [Accepted: 05/25/2015] [Indexed: 11/25/2022] Open
Abstract
Minimal Residual Disease (MRD) detection can be used for early intervention in relapse, risk stratification, and treatment guidance. FLT3 ITD is the most common mutation found in AML patients with normal karyotype. We evaluated the feasibility of NGS with high coverage (up to 2.4.10(6) PE fragments) for MRD monitoring on FLT3 ITD. We sequenced 37 adult patients at diagnosis and various times of their disease (64 samples) and compared the results with FLT3 ITD ratios measured by fragment analysis. We found that NGS could detect variable insertion sites and lengths in a single test for several patients. We also showed mutational shifts between diagnosis and relapse, with the outgrowth of a clone at relapse different from that dominant at diagnosis. Since NGS is scalable, we were able to adapt sensitivity by increasing the number of reads obtained for follow-up samples, compared to diagnosis samples. This technique could be applied to detect biological relapse before its clinical consequences and to better tailor treatments through the use of FLT3 inhibitors. Larger cohorts should be assessed in order to validate this approach.
Collapse
|
57
|
Feng YL, Wicke S, Li JW, Han Y, Lin CS, Li DZ, Zhou TT, Huang WC, Huang LQ, Jin XH. Lineage-Specific Reductions of Plastid Genomes in an Orchid Tribe with Partially and Fully Mycoheterotrophic Species. Genome Biol Evol 2016; 8:2164-75. [PMID: 27412609 PMCID: PMC4987110 DOI: 10.1093/gbe/evw144] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2016] [Indexed: 11/13/2022] Open
Abstract
The plastid genome (plastome) of heterotrophic plants like mycoheterotrophs and parasites shows massive gene losses in consequence to the relaxation of functional constraints on photosynthesis. To understand the patterns of this convergent plastome reduction syndrome in heterotrophic plants, we studied 12 closely related orchids of three different lifeforms from the tribe Neottieae (Orchidaceae). We employ a comparative genomics approach to examine structural and selectional changes in plastomes within Neottieae. Both leafy and leafless heterotrophic species have functionally reduced plastid genome. Our analyses show that genes for the NAD(P)H dehydrogenase complex, the photosystems, and the RNA polymerase have been lost functionally multiple times independently. The physical reduction proceeds in a highly lineage-specific manner, accompanied by structural reconfigurations such as inversions or modifications of the large inverted repeats. Despite significant but minor selectional changes, all retained genes continue to evolve under purifying selection. All leafless Neottia species, including both visibly green and nongreen members, are fully mycoheterotrophic, likely evolved from leafy and partially mycoheterotrophic species. The plastomes of Neottieae span many stages of plastome degradation, including the longest plastome of a mycoheterotroph, providing invaluable insights into the mechanisms of plastome evolution along the transition from autotrophy to full mycoheterotrophy.
Collapse
Affiliation(s)
- Yan-Lei Feng
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Susann Wicke
- Institute for Evolution and Biodiversity, University of Muenster, Germany
| | - Jian-Wu Li
- Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun Township, Mengla County, Yunnan, China
| | - Yu Han
- Nanchang University, Jiangxi, China
| | - Choun-Sea Lin
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - De-Zhu Li
- Key Laboratory of Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Ting-Ting Zhou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Wei-Chang Huang
- Chenshan Shanghai Botanical Garden, Shanghai, Songjiang, China
| | - Lu-Qi Huang
- National Resource Centre for Chinese Materia Medica, China Academy of Chinese Medical Science, Beijing, China
| | - Xiao-Hua Jin
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
58
|
Touyar N, Schbath S, Cellier D, Dauchel H. Poisson Approximation for the Number of Repeats in a Stationary Markov Chain. J Appl Probab 2016. [DOI: 10.1239/jap/1214950359] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Detection of repeated sequences within complete genomes is a powerful tool to help understanding genome dynamics and species evolutionary history. To distinguish significant repeats from those that can be obtained just by chance, statistical methods have to be developed. In this paper we show that the distribution of the number of long repeats in long sequences generated by stationary Markov chains can be approximated by a Poisson distribution with explicit parameter. Thanks to the Chen-Stein method we provide a bound for the approximation error; this bound converges to 0 as soon as the length n of the sequence tends to ∞ and the length t of the repeats satisfies n2ρt = O(1) for some 0 < ρ < 1. Using this Poisson approximation, p-values can then be easily calculated to determine if a given genome is significantly enriched in repeats of length t.
Collapse
|
59
|
FullSSR: Microsatellite Finder and Primer Designer. Adv Bioinformatics 2016; 2016:6040124. [PMID: 27366148 PMCID: PMC4913048 DOI: 10.1155/2016/6040124] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Revised: 05/02/2016] [Accepted: 05/16/2016] [Indexed: 11/17/2022] Open
Abstract
Microsatellites are genomic sequences comprised of tandem repeats of short nucleotide motifs widely used as molecular markers in population genetics. FullSSR is a new bioinformatic tool for microsatellite (SSR) loci detection and primer design using genomic data from NGS assay. The software was tested with 2000 sequences of Oryza sativa shotgun sequencing project from the National Center of Biotechnology Information Trace Archive and with partial genome sequencing with ROCHE 454® from Caiman latirostris, Salvator merianae, Aegla platensis, and Zilchiopsis collastinensis. FullSSR performance was compared against other similar SSR search programs. The results of the use of this kind of approach depend on the parameters set by the user. In addition, results can be affected by the analyzed sequences because of differences among the genomes. FullSSR simplifies the detection of SSRs and primer design on a big data set. The command line interface of FullSSR was intended to be used as part of genomic analysis tools pipeline; however, it can be used as a stand-alone program because the results are easily interpreted for a nonexpert user.
Collapse
|
60
|
Strategies for complete mitochondrial genome sequencing on Ion Torrent PGM™ platform in forensic sciences. Forensic Sci Int Genet 2016; 22:11-21. [DOI: 10.1016/j.fsigen.2016.01.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 12/30/2015] [Accepted: 01/08/2016] [Indexed: 01/08/2023]
|
61
|
Shang J, Peng J, Han J. MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance. PROCEEDINGS OF THE ... SIAM INTERNATIONAL CONFERENCE ON DATA MINING. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2016; 2016:558-566. [PMID: 28174677 PMCID: PMC5292242 DOI: 10.1137/1.9781611974348.63] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Consecutive pattern mining aiming at finding sequential patterns substrings, is a special case of frequent pattern mining and has been played a crucial role in many real world applications, especially in biological sequence analysis, time series analysis, and network log mining. Approximations, including insertions, deletions, and substitutions, between strings are widely used in biological sequence comparisons. However, most existing string pattern mining methods only consider hamming distance without insertions/deletions (indels). Little attention has been paid to the general approximate consecutive frequent pattern mining under edit distance, potentially due to the high computational complexity, particularly on DNA sequences with billions of base pairs. In this paper, we introduce an efficient solution to this problem. We first formulate the Maximal Approximate Consecutive Frequent Pattern Mining (MACFP) problem that identifies substring patterns under edit distance in a long query sequence. Then, we propose a novel algorithm with linear time complexity to check whether the support of a substring pattern is above a predefined threshold in the query sequence, thus greatly reducing the computational complexity of MACFP. With this fast decision algorithm, we can efficiently solve the original pattern discovery problem with several indexing and searching techniques. Comprehensive experiments on sequence pattern analysis and a study on cancer genomics application demonstrate the effectiveness and efficiency of our algorithm, compared to several existing methods.
Collapse
Affiliation(s)
- Jingbo Shang
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jiawei Han
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
62
|
Diaz-Lara A, Gent DH, Martin RR. Identification of Extrachromosomal Circular DNA in Hop via Rolling Circle Amplification. Cytogenet Genome Res 2016; 148:237-40. [PMID: 27160259 DOI: 10.1159/000445849] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2016] [Indexed: 11/19/2022] Open
Abstract
During a survey for new viruses affecting hop plants, a circular DNA molecule was identified via rolling circle amplification (RCA) and later characterized. A small region of the 5.7-kb long molecule aligned with a microsatellite region in the Humulus lupulus genome, and no coding sequence was identified. Sequence analysis and literature review suggest that the small DNA molecule is an extranuclear DNA element, specifically, an extrachromosomal circular DNA (eccDNA), and its presence was confirmed by electron microscopy. This work is the first report of eccDNAs in the family Cannabaceae. Additionally, this work highlights the advantages of using RCA to study extrachromosomal DNA in higher plants.
Collapse
Affiliation(s)
- Alfredo Diaz-Lara
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oreg., USA
| | | | | |
Collapse
|
63
|
Ritchie H, Jamieson AJ, Piertney SB. Isolation and Characterization of Microsatellite DNA Markers in the Deep-Sea Amphipod Paralicella tenuipes by Illumina MiSeq Sequencing. J Hered 2016; 107:367-71. [PMID: 27012615 DOI: 10.1093/jhered/esw019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 03/21/2016] [Indexed: 11/14/2022] Open
Abstract
Here, we describe the development of 16 polymorphic microsatellite markers using an Illumina MiSeq sequencing approach in the deep-sea amphipod Paralicella tenuipes A total of 25 577 844 DNA sequences were filtered for microsatellite motifs of which 197 873 sequences were identified. From these sequences, 64 had sufficient flanking regions for primer design and 16 of these loci were polymorphic. Between 5 and 30 alleles were detected per locus, with an average of 13.63 alleles per locus, across a total of 120 individuals from 5 separate deep sea trenches from the Pacific Ocean. For the 16 loci, observed and expected heterozygosity values ranged from 0.116 to 0.414 and 0.422 to 0.820, respectively, with one locus displaying significant deviation from Hardy-Weinberg equilibrium. The microsatellite loci that have been isolated and described here are the first molecular markers developed for deep sea amphipods and will be invaluable for elucidating the genetic population structure and the extent of connectivity between deep ocean trenches.
Collapse
Affiliation(s)
- Heather Ritchie
- From the Institute of Biological and Environmental Sciences, University of Aberdeen, Zoology Building, Aberdeen AB24 2TZ, UK (Ritchie and Piertney); and Oceanlab, University of Aberdeen, Newburgh, Aberdeenshire AB41 6AA, UK (Jamieson).
| | - Alan J Jamieson
- From the Institute of Biological and Environmental Sciences, University of Aberdeen, Zoology Building, Aberdeen AB24 2TZ, UK (Ritchie and Piertney); and Oceanlab, University of Aberdeen, Newburgh, Aberdeenshire AB41 6AA, UK (Jamieson)
| | - Stuart B Piertney
- From the Institute of Biological and Environmental Sciences, University of Aberdeen, Zoology Building, Aberdeen AB24 2TZ, UK (Ritchie and Piertney); and Oceanlab, University of Aberdeen, Newburgh, Aberdeenshire AB41 6AA, UK (Jamieson)
| |
Collapse
|
64
|
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.
Collapse
Affiliation(s)
- Jacques Nicolas
- Dyliss Team, Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France.
| | - Pierre Peterlongo
- Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France
| | - Sébastien Tempel
- LCB, CNRS UMR 7283, 31 Chemin Joseph Aiguier, 13402, Marseille cedex 20, France
| |
Collapse
|
65
|
Abstract
BACKGROUND With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes. RESULTS In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation. CONCLUSION PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.
Collapse
Affiliation(s)
- Concita Cantarella
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per l'orticoltura, Via Cavalleggeri 25, 84098, Pontecagnano Faiano, Italy.
| | - Nunzio D'Agostino
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per l'orticoltura, Via Cavalleggeri 25, 84098, Pontecagnano Faiano, Italy.
| |
Collapse
|
66
|
Cunty A, Cesbron S, Poliakoff F, Jacques MA, Manceau C. Origin of the Outbreak in France of Pseudomonas syringae pv. actinidiae Biovar 3, the Causal Agent of Bacterial Canker of Kiwifruit, Revealed by a Multilocus Variable-Number Tandem-Repeat Analysis. Appl Environ Microbiol 2015; 81:6773-89. [PMID: 26209667 PMCID: PMC4561677 DOI: 10.1128/aem.01688-15] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 07/13/2015] [Indexed: 11/20/2022] Open
Abstract
The first outbreaks of bacterial canker of kiwifruit caused by Pseudomonas syringae pv. actinidiae biovar 3 were detected in France in 2010. P. syringae pv. actinidiae causes leaf spots, dieback, and canker that sometimes lead to the death of the vine. P. syringae pv. actinidifoliorum, which is pathogenic on kiwi as well, causes only leaf spots. In order to conduct an epidemiological study to track the spread of the epidemics of these two pathogens in France, we developed a multilocus variable-number tandem-repeat (VNTR) analysis (MLVA). MLVA was conducted on 340 strains of P. syringae pv. actinidiae biovar 3 isolated in Chile, China, France, Italy, and New Zealand and on 39 strains of P. syringae pv. actinidifoliorum isolated in Australia, France, and New Zealand. Eleven polymorphic VNTR loci were identified in the genomes of P. syringae pv. actinidiae biovar 3 ICMP 18744 and of P. syringae pv. actinidifoliorum ICMP 18807. MLVA enabled the structuring of P. syringae pv. actinidiae biovar 3 and P. syringae pv. actinidifoliorum strains in 55 and 16 haplotypes, respectively. MLVA and discriminant analysis of principal components revealed that strains isolated in Chile, China, and New Zealand are genetically distinct from P. syringae pv. actinidiae strains isolated in France and in Italy, which appear to be closely related at the genetic level. In contrast, no structuring was observed for P. syringae pv. actinidifoliorum. We developed an MLVA scheme to explore the diversity within P. syringae pv. actinidiae biovar 3 and to trace the dispersal routes of epidemic P. syringae pv. actinidiae biovar 3 in Europe. We suggest using this MLVA scheme to trace the dispersal routes of P. syringae pv. actinidiae at a global level.
Collapse
Affiliation(s)
- A Cunty
- UMR1345 Institut de Recherche en Horticulture et Semences, SFR 4207 Quasav, Institut National de la Recherche Agronomique, Beaucouzé, France Laboratoire de la Santé des Végétaux, Agence Nationale de Sécurité Sanitaire de l'Alimentation, de l'Environnement et du Travail, Angers, France
| | - S Cesbron
- UMR1345 Institut de Recherche en Horticulture et Semences, SFR 4207 Quasav, Institut National de la Recherche Agronomique, Beaucouzé, France
| | - F Poliakoff
- Laboratoire de la Santé des Végétaux, Agence Nationale de Sécurité Sanitaire de l'Alimentation, de l'Environnement et du Travail, Angers, France
| | - M-A Jacques
- UMR1345 Institut de Recherche en Horticulture et Semences, SFR 4207 Quasav, Institut National de la Recherche Agronomique, Beaucouzé, France
| | - C Manceau
- Laboratoire de la Santé des Végétaux, Agence Nationale de Sécurité Sanitaire de l'Alimentation, de l'Environnement et du Travail, Angers, France
| |
Collapse
|
67
|
Fertin G, Jean G, Radulescu A, Rusu I. Hybrid de novo tandem repeat detection using short and long reads. BMC Med Genomics 2015; 8 Suppl 3:S5. [PMID: 26399998 PMCID: PMC4582210 DOI: 10.1186/1755-8794-8-s3-s5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Background As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. Methods In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies. Results MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns. Conclusions Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.
Collapse
|
68
|
Kakumani PK, Shukla R, Todur VN, Malhotra P, Mukherjee SK, Bhatnagar RK. De novo transcriptome assembly and analysis of Sf21 cells using illumina paired end sequencing. Biol Direct 2015; 10:44. [PMID: 26290335 PMCID: PMC4545970 DOI: 10.1186/s13062-015-0072-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/06/2015] [Indexed: 11/10/2022] Open
Abstract
Spodoptera is an important polyphagous agricultural insect pest in the tropical world. The genomic details are limited to understand the pest biology at molecular level. In the present study, we sequenced and assembled the transcriptome from Sf21 cells into a non redundant set of 24,038 contigs of ~ 47.38 Mb in size. A total of 26,390 unigenes were identified from the assembled transcripts and their annotation revealed the prevalent protein domains in Sf21 cells. The present study would provide a resource for gene discovery and development of functional molecular markers to understand the biology of S. frugiperda.
Collapse
Affiliation(s)
- Pavan Kumar Kakumani
- International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Rohit Shukla
- Bionivid Technology Pvt. Ltd., 401, 4 AB Cross, 1st Main, Kasturi Nagar, NGEF East, Bangalore, 560043, India
| | - Vivek N Todur
- Bionivid Technology Pvt. Ltd., 401, 4 AB Cross, 1st Main, Kasturi Nagar, NGEF East, Bangalore, 560043, India
| | - Pawan Malhotra
- International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi, 110067, India.
| | - Sunil K Mukherjee
- International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi, 110067, India. .,Present address: Department of Genetics, University of Delhi South Campus, Benito Juarez Road, New Delhi, 110021, India.
| | - Raj K Bhatnagar
- International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi, 110067, India.
| |
Collapse
|
69
|
Girgis HZ. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 2015. [PMID: 26206263 PMCID: PMC4513396 DOI: 10.1186/s12859-015-0654-5] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background With rapid advancements in technology, the sequences of thousands of species’ genomes are becoming available. Within the sequences are repeats that comprise significant portions of genomes. Successful annotations thus require accurate discovery of repeats. As species-specific elements, repeats in newly sequenced genomes are likely to be unknown. Therefore, annotating newly sequenced genomes requires tools to discover repeats de-novo. However, the currently available de-novo tools have limitations concerning the size of the input sequence, ease of use, sensitivities to major types of repeats, consistency of performance, speed, and false positive rate. Results To address these limitations, I designed and developed Red, applying Machine Learning. Red is the first repeat-detection tool capable of labeling its training data and training itself automatically on an entire genome. Red is easy to install and use. It is sensitive to both transposons and simple repeats; in contrast, available tools such as RepeatScout and ReCon are sensitive to transposons, and WindowMasker to simple repeats. Red performed consistently well on seven genomes; the other tools performed well only on some genomes. Red is much faster than RepeatScout and ReCon and has a much lower false positive rate than WindowMasker. On human genes with five or more copies, Red was more specific than RepeatScout by a wide margin. When tested on genomes of unusual nucleotide compositions, Red located repeats with high sensitivities and maintained moderate false positive rates. Red outperformed the related tools on a bacterial genome. Red identified 46,405 novel repetitive segments in the human genome. Finally, Red is capable of processing assembled and unassembled genomes. Conclusions Red’s innovative methodology and its excellent performance on seven different genomes represent a valuable advancement in the field of repeats discovery. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0654-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hani Z Girgis
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, 20894, MD, USA. .,Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104, OK, USA.
| |
Collapse
|
70
|
Woo YH, Ansari H, Otto TD, Klinger CM, Kolisko M, Michálek J, Saxena A, Shanmugam D, Tayyrov A, Veluchamy A, Ali S, Bernal A, del Campo J, Cihlář J, Flegontov P, Gornik SG, Hajdušková E, Horák A, Janouškovec J, Katris NJ, Mast FD, Miranda-Saavedra D, Mourier T, Naeem R, Nair M, Panigrahi AK, Rawlings ND, Padron-Regalado E, Ramaprasad A, Samad N, Tomčala A, Wilkes J, Neafsey DE, Doerig C, Bowler C, Keeling PJ, Roos DS, Dacks JB, Templeton TJ, Waller RF, Lukeš J, Oborník M, Pain A. Chromerid genomes reveal the evolutionary path from photosynthetic algae to obligate intracellular parasites. eLife 2015; 4:e06974. [PMID: 26175406 PMCID: PMC4501334 DOI: 10.7554/elife.06974] [Citation(s) in RCA: 140] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 06/16/2015] [Indexed: 12/18/2022] Open
Abstract
The eukaryotic phylum Apicomplexa encompasses thousands of obligate intracellular parasites of humans and animals with immense socio-economic and health impacts. We sequenced nuclear genomes of Chromera velia and Vitrella brassicaformis, free-living non-parasitic photosynthetic algae closely related to apicomplexans. Proteins from key metabolic pathways and from the endomembrane trafficking systems associated with a free-living lifestyle have been progressively and non-randomly lost during adaptation to parasitism. The free-living ancestor contained a broad repertoire of genes many of which were repurposed for parasitic processes, such as extracellular proteins, components of a motility apparatus, and DNA- and RNA-binding protein families. Based on transcriptome analyses across 36 environmental conditions, Chromera orthologs of apicomplexan invasion-related motility genes were co-regulated with genes encoding the flagellar apparatus, supporting the functional contribution of flagella to the evolution of invasion machinery. This study provides insights into how obligate parasites with diverse life strategies arose from a once free-living phototrophic marine alga. DOI:http://dx.doi.org/10.7554/eLife.06974.001 Single-celled parasites cause many severe diseases in humans and animals. The apicomplexans form probably the most successful group of these parasites and include the parasites that cause malaria. Apicomplexans infect a broad range of hosts, including humans, reptiles, birds, and insects, and often have complicated life cycles. For example, the malaria-causing parasites spread by moving from humans to female mosquitoes and then back to humans. Despite significant differences amongst apicomplexans, these single-celled parasites also share a number of features that are not seen in other living species. How and when these features arose remains unclear. It is known from previous work that apicomplexans are closely related to single-celled algae. But unlike apicomplexans, which depend on a host animal to survive, these algae live freely in their environment, often in close association with corals. Woo et al. have now sequenced the genomes of two photosynthetic algae that are thought to be close living relatives of the apicomplexans. These genomes were then compared to each other and to the genomes of other algae and apicomplexans. These comparisons reconfirmed that the two algae that were studied were close relatives of the apicomplexans. Further analyses suggested that thousands of genes were lost as an ancient free-living algae evolved into the apicomplexan ancestor, and further losses occurred as these early parasites evolved into modern species. The lost genes were typically those that are important for free-living organisms, but are either a hindrance to, or not needed in, a parasitic lifestyle. Some of the ancestor's genes, especially those that coded for the building blocks of flagella (structures which free-living algae use to move around), were repurposed in ways that helped the apicomplexans to invade their hosts. Understanding this repurposing process in greater detail will help to identify key molecules in these deadly parasites that could be targeted by drug treatments. It will also offer answers to one of the most fascinating questions in evolutionary biology: how parasites have evolved from free-living organisms. DOI:http://dx.doi.org/10.7554/eLife.06974.002
Collapse
Affiliation(s)
- Yong H Woo
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Hifzur Ansari
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Thomas D Otto
- Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | | | - Martin Kolisko
- Canadian Institute for Advanced Research, Department of Botany, University of British Columbia, Vancouver, Canada
| | - Jan Michálek
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Alka Saxena
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | | | - Annageldi Tayyrov
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Alaguraj Veluchamy
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR8197 INSERM U1024, Paris, France
| | - Shahjahan Ali
- Bioscience Core Laboratory, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Axel Bernal
- Department of Biology, University of Pennsylvania, Philadelphia, United States
| | - Javier del Campo
- Canadian Institute for Advanced Research, Department of Botany, University of British Columbia, Vancouver, Canada
| | - Jaromír Cihlář
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Pavel Flegontov
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | | | - Eva Hajdušková
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Aleš Horák
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Jan Janouškovec
- Canadian Institute for Advanced Research, Department of Botany, University of British Columbia, Vancouver, Canada
| | | | - Fred D Mast
- Seattle Biomedical Research Institute, Seattle, United States
| | - Diego Miranda-Saavedra
- Centro de Biología Molecular Severo Ochoa, CSIC/Universidad Autónoma de Madrid, Madrid, Spain
| | - Tobias Mourier
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Raeece Naeem
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Mridul Nair
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Aswini K Panigrahi
- Bioscience Core Laboratory, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Neil D Rawlings
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Eriko Padron-Regalado
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Abhinay Ramaprasad
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Nadira Samad
- School of Botany, University of Melbourne, Parkville, Australia
| | - Aleš Tomčala
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Jon Wilkes
- Wellcome Trust Centre For Molecular Parasitology, Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Daniel E Neafsey
- Broad Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, Cambridge, United States
| | - Christian Doerig
- Department of Microbiology, Monash University, Clayton, Australia
| | - Chris Bowler
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR8197 INSERM U1024, Paris, France
| | - Patrick J Keeling
- Canadian Institute for Advanced Research, Department of Botany, University of British Columbia, Vancouver, Canada
| | - David S Roos
- Department of Biology, University of Pennsylvania, Philadelphia, United States
| | - Joel B Dacks
- Department of Cell Biology, University of Alberta, Edmonton, Canada
| | - Thomas J Templeton
- Department of Microbiology and Immunology, Weill Cornell Medical College, New York, United States
| | - Ross F Waller
- School of Botany, University of Melbourne, Parkville, Australia
| | - Julius Lukeš
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Miroslav Oborník
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Arnab Pain
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
71
|
Conservation genetics of Magnolia acuminata, an endangered species in Canada: Can genetic diversity be maintained in fragmented, peripheral populations? CONSERV GENET 2015. [DOI: 10.1007/s10592-015-0746-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
72
|
Admixture in Humans of Two Divergent Plasmodium knowlesi Populations Associated with Different Macaque Host Species. PLoS Pathog 2015; 11:e1004888. [PMID: 26020959 PMCID: PMC4447398 DOI: 10.1371/journal.ppat.1004888] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 04/17/2015] [Indexed: 12/20/2022] Open
Abstract
Human malaria parasite species were originally acquired from other primate hosts and subsequently became endemic, then spread throughout large parts of the world. A major zoonosis is now occurring with Plasmodium knowlesi from macaques in Southeast Asia, with a recent acceleration in numbers of reported cases particularly in Malaysia. To investigate the parasite population genetics, we developed sensitive and species-specific microsatellite genotyping protocols and applied these to analysis of samples from 10 sites covering a range of >1,600 km within which most cases have occurred. Genotypic analyses of 599 P. knowlesi infections (552 in humans and 47 in wild macaques) at 10 highly polymorphic loci provide radical new insights on the emergence. Parasites from sympatric long-tailed macaques (Macaca fascicularis) and pig-tailed macaques (M. nemestrina) were very highly differentiated (FST = 0.22, and K-means clustering confirmed two host-associated subpopulations). Approximately two thirds of human P. knowlesi infections were of the long-tailed macaque type (Cluster 1), and one third were of the pig-tailed-macaque type (Cluster 2), with relative proportions varying across the different sites. Among the samples from humans, there was significant indication of genetic isolation by geographical distance overall and within Cluster 1 alone. Across the different sites, the level of multi-locus linkage disequilibrium correlated with the degree of local admixture of the two different clusters. The widespread occurrence of both types of P. knowlesi in humans enhances the potential for parasite adaptation in this zoonotic system. Extraordinary phases of pathogen evolution may occur during an emerging zoonosis, potentially involving adaptation to human hosts, with changes in patterns of virulence and transmission. In a large population genetic survey, we show that the malaria parasite Plasmodium knowlesi in humans is an admixture of two highly divergent parasite populations, each associated with different forest-dwelling macaque reservoir host species. Most of the transmission and sexual reproduction occurs separately in each of the two parasite populations. In addition to the reservoir host-associated parasite population structure, there was also significant genetic differentiation that correlated with geographical distance. Although both P. knowlesi types co-exist in the same areas, the divergence between them is similar to or greater than that seen between sub-species in other sexually reproducing eukaryotes. This may offer particular opportunities for evolution of virulence and host-specificity, not seen with other malaria parasites, so studies of ongoing adaptation and interventions to reduce transmission are urgent priorities.
Collapse
|
73
|
Oueslati AE, Messaoudi I, Lachiri Z, Ellouze N. A new way to visualize DNA's base succession: the Caenorhabditis elegans chromosome landscapes. Med Biol Eng Comput 2015; 53:1165-76. [PMID: 26003183 DOI: 10.1007/s11517-015-1304-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 05/03/2015] [Indexed: 12/21/2022]
Abstract
In the eukaryotic genomes, the genetic diseases are generally associated with the tandem repeats. These repeats seem to appear frequently. In this paper, we are describing a wavelet transform technique which provides a new way to represent the DNA succession bases as a DNA progression images. These images offer DNA landscapes, visualizing and following up periodicities through genomes. We investigated in a structural coding technique the Pnuc. Then, we illustrated, with time-frequency representation, the existence and the superposition of the periodicities in some biological features, their locations and the different ways in which they appear. The representations generated showed that one periodicity can sometimes be alone, but generally, it is incorporated to others. These periodicities associations create, in the Caenorhabditis elegans chromosome, a precise structural image of biological features, such as CeRep, Helitrons, repeats and satellites.
Collapse
Affiliation(s)
- Afef Elloumi Oueslati
- Laboratoire Signal, Image et Technologies de l'information, Département de Génie Electrique, Ecole Nationale d'Ingénieurs de Tunis, BP 37, Campus Universitaire, Le Belvédère, 1002, Tunis Cedex, Tunisia.
| | - Imen Messaoudi
- Laboratoire Signal, Image et Technologies de l'information, Département de Génie Electrique, Ecole Nationale d'Ingénieurs de Tunis, BP 37, Campus Universitaire, Le Belvédère, 1002, Tunis Cedex, Tunisia
| | - Zied Lachiri
- Département de Génie Physique et Instrumentation, Institut National des Sciences Appliquées et de Technologie, BP 676, Centre Urbain, 1080, Tunis Cedex, Tunisia
| | - Noureddine Ellouze
- Laboratoire Signal, Image et Technologies de l'information, Département de Génie Electrique, Ecole Nationale d'Ingénieurs de Tunis, BP 37, Campus Universitaire, Le Belvédère, 1002, Tunis Cedex, Tunisia
| |
Collapse
|
74
|
ProGeRF: proteome and genome repeat finder utilizing a fast parallel hash function. BIOMED RESEARCH INTERNATIONAL 2015; 2015:394157. [PMID: 25811026 PMCID: PMC4355816 DOI: 10.1155/2015/394157] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 01/19/2015] [Accepted: 01/31/2015] [Indexed: 12/20/2022]
Abstract
Repetitive element sequences are adjacent, repeating patterns, also called motifs, and can be of different lengths; repetitions can involve their exact or approximate copies. They have been widely used as molecular markers in population
biology. Given the sizes of sequenced genomes, various bioinformatics tools have been developed for the extraction of repetitive elements from DNA sequences. However, currently available tools do not provide options for identifying repetitive elements in the genome or proteome, displaying a user-friendly web interface, and performing-exhaustive searches. ProGeRF is a web site for extracting repetitive regions from genome and proteome sequences. It was designed to be efficient, fast, and accurate and
primarily user-friendly web tool allowing many ways to view and analyse the results. ProGeRF (Proteome and Genome Repeat Finder) is freely available as a stand-alone program, from which the users can download the source code, and as a web tool. It was developed using the hash table approach to extract perfect and imperfect repetitive regions in a (multi)FASTA file, while allowing a linear time complexity.
Collapse
|
75
|
Maumus F, Fiston-Lavier AS, Quesneville H. Impact of transposable elements on insect genomes and biology. CURRENT OPINION IN INSECT SCIENCE 2015; 7:30-36. [PMID: 32846669 DOI: 10.1016/j.cois.2015.01.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 12/30/2014] [Accepted: 01/06/2015] [Indexed: 06/11/2023]
Affiliation(s)
- Florian Maumus
- Unité de recherche en Génomique-Info (URGI), UR1164, INRA, RD10 route de Saint Cyr, 78026 Versailles, France.
| | - Anna-Sophie Fiston-Lavier
- Institut des Sciences de l'Evolution de Montpellier (ISEM), UMR5554 CNRS-Université Montpellier II, 2 place Eugene Bataillon, bat. 22, CC065 34095 Montpellier Cedex 05, France
| | - Hadi Quesneville
- Unité de recherche en Génomique-Info (URGI), UR1164, INRA, RD10 route de Saint Cyr, 78026 Versailles, France
| |
Collapse
|
76
|
Aguileta G, de Vienne DM, Ross ON, Hood ME, Giraud T, Petit E, Gabaldón T. High variability of mitochondrial gene order among fungi. Genome Biol Evol 2015; 6:451-65. [PMID: 24504088 PMCID: PMC3942027 DOI: 10.1093/gbe/evu028] [Citation(s) in RCA: 148] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
From their origin as an early alpha proteobacterial endosymbiont to their current state as cellular organelles, large-scale genomic reorganization has taken place in the mitochondria of all main eukaryotic lineages. So far, most studies have focused on plant and animal mitochondrial (mt) genomes (mtDNA), but fungi provide new opportunities to study highly differentiated mtDNAs. Here, we analyzed 38 complete fungal mt genomes to investigate the evolution of mtDNA gene order among fungi. In particular, we looked for evidence of nonhomologous intrachromosomal recombination and investigated the dynamics of gene rearrangements. We investigated the effect that introns, intronic open reading frames (ORFs), and repeats may have on gene order. Additionally, we asked whether the distribution of transfer RNAs (tRNAs) evolves independently to that of mt protein-coding genes. We found that fungal mt genomes display remarkable variation between and within the major fungal phyla in terms of gene order, genome size, composition of intergenic regions, and presence of repeats, introns, and associated ORFs. Our results support previous evidence for the presence of mt recombination in all fungal phyla, a process conspicuously lacking in most Metazoa. Overall, the patterns of rearrangements may be explained by the combined influences of recombination (i.e., most likely nonhomologous and intrachromosomal), accumulated repeats, especially at intergenic regions, and to a lesser extent, mobile element dynamics.
Collapse
Affiliation(s)
- Gabriela Aguileta
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | | | | | | | | | | | | |
Collapse
|
77
|
Ruperao P, Edwards D. Bioinformatics: identification of markers from next-generation sequence data. Methods Mol Biol 2015; 1245:29-47. [PMID: 25373747 DOI: 10.1007/978-1-4939-1966-6_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
With the advent of sequencing technology, next-generation sequencing (NGS) technology has dramatically revolutionized plant genomics. NGS technology combined with new software tools enables the discovery, validation, and assessment of genetic markers on a large scale. Among different markers systems, simple sequence repeats (SSRs) and Single nucleotide polymorphisms (SNPs) are the markers of choice for genetics and plant breeding. SSR markers have been a choice for large-scale characterization of germplasm collections, construction of genetic maps, and QTL identification. Similarly, SNPs are the most abundant genetic variations with higher frequencies throughout the genome of plant species. This chapter discusses various tools available for genome assembly and widely focuses on SSR and SNP marker discovery.
Collapse
Affiliation(s)
- Pradeep Ruperao
- School of Agriculture and Food Sciences, University of Queensland, Brisbane, QLD, Australia
| | | |
Collapse
|
78
|
Carmona R, Zafra A, Seoane P, Castro AJ, Guerrero-Fernández D, Castillo-Castillo T, Medina-García A, Cánovas FM, Aldana-Montes JF, Navas-Delgado I, Alché JDD, Claros MG. ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome. FRONTIERS IN PLANT SCIENCE 2015; 6:625. [PMID: 26322066 PMCID: PMC4531244 DOI: 10.3389/fpls.2015.00625] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 07/28/2015] [Indexed: 05/18/2023]
Abstract
Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species.
Collapse
Affiliation(s)
- Rosario Carmona
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
| | - Adoración Zafra
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - Pedro Seoane
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
| | - Antonio J. Castro
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - Darío Guerrero-Fernández
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
| | | | - Ana Medina-García
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Francisco M. Cánovas
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
| | - José F. Aldana-Montes
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Ismael Navas-Delgado
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Juan de Dios Alché
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - M. Gonzalo Claros
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
- *Correspondence: M. Gonzalo Claros, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Campus de Teatinos, 29071 Málaga, Spain,
| |
Collapse
|
79
|
Liu M, Zhang Z, Peng Z. The mitochondrial genome of the water spiderArgyroneta aquatica(Araneae: Cybaeidae). ZOOL SCR 2014. [DOI: 10.1111/zsc.12090] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Mingxin Liu
- Key Laboratory of Eco-environments in Three Gorges Reservoir Region (Ministry of Education); Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education); School of Life Science; Southwest University; Chongqing 400715 China
| | - Zhisheng Zhang
- Key Laboratory of Eco-environments in Three Gorges Reservoir Region (Ministry of Education); Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education); School of Life Science; Southwest University; Chongqing 400715 China
| | - Zuogang Peng
- Key Laboratory of Eco-environments in Three Gorges Reservoir Region (Ministry of Education); Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education); School of Life Science; Southwest University; Chongqing 400715 China
| |
Collapse
|
80
|
Development of a gene-centered ssr atlas as a resource for papaya (Carica papaya) marker-assisted selection and population genetic studies. PLoS One 2014; 9:e112654. [PMID: 25393538 PMCID: PMC4231050 DOI: 10.1371/journal.pone.0112654] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 10/08/2014] [Indexed: 01/08/2023] Open
Abstract
Carica papaya (papaya) is an economically important tropical fruit. Molecular marker-assisted selection is an inexpensive and reliable tool that has been widely used to improve fruit quality traits and resistance against diseases. In the present study we report the development and validation of an atlas of papaya simple sequence repeat (SSR) markers. We integrated gene predictions and functional annotations to provide a gene-centered perspective for marker-assisted selection studies. Our atlas comprises 160,318 SSRs, from which 21,231 were located in genic regions (i.e. inside exons, exon-intron junctions or introns). A total of 116,453 (72.6%) of all identified repeats were successfully mapped to one of the nine papaya linkage groups. Primer pairs were designed for markers from 9,594 genes (34.5% of the papaya gene complement). Using papaya-tomato orthology assessments, we assembled a list of 300 genes (comprising 785 SSRs) potentially involved in fruit ripening. We validated our atlas by screening 73 SSR markers (including 25 fruit ripening genes), achieving 100% amplification rate and uncovering 26% polymorphism rate between the parental genotypes (Sekati and JS12). The SSR atlas presented here is the first comprehensive gene-centered collection of annotated and genome positioned papaya SSRs. These features combined with thousands of high-quality primer pairs make the atlas an important resource for the papaya research community.
Collapse
|
81
|
Benzekri H, Armesto P, Cousin X, Rovira M, Crespo D, Merlo MA, Mazurais D, Bautista R, Guerrero-Fernández D, Fernandez-Pozo N, Ponce M, Infante C, Zambonino JL, Nidelet S, Gut M, Rebordinos L, Planas JV, Bégout ML, Claros MG, Manchado M. De novo assembly, characterization and functional annotation of Senegalese sole (Solea senegalensis) and common sole (Solea solea) transcriptomes: integration in a database and design of a microarray. BMC Genomics 2014; 15:952. [PMID: 25366320 PMCID: PMC4232633 DOI: 10.1186/1471-2164-15-952] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 10/15/2014] [Indexed: 12/26/2022] Open
Abstract
Background Senegalese sole (Solea senegalensis) and common sole (S. solea) are two economically and evolutionary important flatfish species both in fisheries and aquaculture. Although some genomic resources and tools were recently described in these species, further sequencing efforts are required to establish a complete transcriptome, and to identify new molecular markers. Moreover, the comparative analysis of transcriptomes will be useful to understand flatfish evolution. Results A comprehensive characterization of the transcriptome for each species was carried out using a large set of Illumina data (more than 1,800 millions reads for each sole species) and 454 reads (more than 5 millions reads only in S. senegalensis), providing coverages ranging from 1,384x to 2,543x. After a de novo assembly, 45,063 and 38,402 different transcripts were obtained, comprising 18,738 and 22,683 full-length cDNAs in S. senegalensis and S. solea, respectively. A reference transcriptome with the longest unique transcripts and putative non-redundant new transcripts was established for each species. A subset of 11,953 reference transcripts was qualified as highly reliable orthologs (>97% identity) between both species. A small subset of putative species-specific, lineage-specific and flatfish-specific transcripts were also identified. Furthermore, transcriptome data permitted the identification of single nucleotide polymorphisms and simple-sequence repeats confirmed by FISH to be used in further genetic and expression studies. Moreover, evidences on the retention of crystallins crybb1, crybb1-like and crybb3 in the two species of soles are also presented. Transcriptome information was applied to the design of a microarray tool in S. senegalensis that was successfully tested and validated by qPCR. Finally, transcriptomic data were hosted and structured at SoleaDB. Conclusions Transcriptomes and molecular markers identified in this study represent a valuable source for future genomic studies in these economically important species. Orthology analysis provided new clues regarding sole genome evolution indicating a divergent evolution of crystallins in flatfish. The design of a microarray and establishment of a reference transcriptome will be useful for large-scale gene expression studies. Moreover, the integration of transcriptomic data in the SoleaDB will facilitate the management of genomic information in these important species. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-952) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Manuel Manchado
- IFAPA Centro El Toruño, IFAPA, Consejeria de Agricultura y Pesca, 11500 El Puerto de Santa María, Cádiz, Spain.
| |
Collapse
|
82
|
Barton C, Iliopoulos CS, Pissis SP. Optimal computation of all tandem repeats in a weighted sequence. Algorithms Mol Biol 2014; 9:21. [PMID: 25221616 PMCID: PMC4152798 DOI: 10.1186/s13015-014-0021-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 07/11/2014] [Indexed: 12/02/2022] Open
Abstract
Background Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment. Results Crochemore’s repetitions algorithm, also referred to as Crochemore’s partitioning algorithm, was introduced in 1981, and was the first optimal O(nlogn)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore’s partitioning algorithm for weighted sequences, which requires optimal O(nlogn) time, thus improving on the best known On2-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.
Collapse
|
83
|
Sun WY, Sun SC. A description of the complete mitochondrial genomes of Amphiporus formidabilis, Prosadenoporus spectaculum and Nipponnemertes punctatula (Nemertea: Hoplonemertea: Monostilifera). Mol Biol Rep 2014; 41:5681-92. [PMID: 24939507 DOI: 10.1007/s11033-014-3438-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2013] [Accepted: 05/27/2014] [Indexed: 11/30/2022]
Abstract
We sequenced the complete mitochondrial genomes (mitogenomes) of three Hoplonemertea species, Amphiporus formidabilis, Prosadenoporus spectaculum and Nipponnemertes punctatula, which are 14,616, 14,655 and 15,354 bp in length, respectively. Each of the three circular mitogenomes consists of 37 typical genes and some non-coding regions. The nucleotide composition of the coding strand is biased toward T, almost a half of total nucleotides in these mitogenomes. There are many poly-T tracts across these mitogenomes, which exhibit T-number variation within different clones of protein-coding genes, mainly resulting from false PCR amplification. The major non-coding regions have tandem repeat motifs and hairpin-like structures that may be associated with the initiation of replication or transcription. Data published to date for nemerteans show that Palaeonemertea species usually bear the largest mitogenomes, while representatives in the more recently derived Distromatonemertea clade bear the smallest ones; and that the gene arrangement of mitogenomes seems to be variable within the phylum Nemertea, but stable within either of Heteronemertea and Hoplonemertea.
Collapse
Affiliation(s)
- Wen-Yan Sun
- Institute of Evolution & Marine Biodiversity, Ocean University of China, 5 Yushan Road, Qingdao, 266003, China
| | | |
Collapse
|
84
|
Pauchet Y, Saski CA, Feltus FA, Luyten I, Quesneville H, Heckel DG. Studying the organization of genes encoding plant cell wall degrading enzymes in Chrysomela tremula provides insights into a leaf beetle genome. INSECT MOLECULAR BIOLOGY 2014; 23:286-300. [PMID: 24456018 DOI: 10.1111/imb.12081] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The ability of herbivorous beetles from the superfamilies Chrysomeloidea and Curculionoidea to degrade plant cell wall polysaccharides has only recently begun to be appreciated. The presence of plant cell wall degrading enzymes (PCWDEs) in the beetle's digestive tract makes this degradation possible. Sequences encoding these beetle-derived PCWDEs were originally identified from transcriptomes and strikingly resemble those of saprophytic and phytopathogenic microorganisms, raising questions about their origin; e.g. are they insect- or microorganism-derived? To demonstrate unambiguously that the genes encoding PCWDEs found in beetle transcriptomes are indeed of insect origin, we generated a bacterial artificial chromosome library from the genome of the leaf beetle Chrysomela tremula, containing 18 432 clones with an average size of 143 kb. After hybridizing this library with probes derived from 12 C. tremula PCWDE-encoding genes and sequencing the positive clones, we demonstrated that the latter genes are encoded by the insect's genome and are surrounded by genes possessing orthologues in the genome of Tribolium castaneum as well as in three other beetle genomes. Our analyses showed that although the level of overall synteny between C. tremula and T. castaneum seems high, the degree of microsynteny between both species is relatively low, in contrast to the more closely related Colorado potato beetle.
Collapse
Affiliation(s)
- Y Pauchet
- Entomology, Max Planck Institute for Chemical Ecology, Jena, Germany
| | | | | | | | | | | |
Collapse
|
85
|
Chaley M, Kutyrkin V, Tulbasheva G, Teplukhina E, Nazipova N. HeteroGenome: database of genome periodicity. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau040. [PMID: 24857969 PMCID: PMC4038257 DOI: 10.1093/database/bau040] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We present the first release of the HeteroGenome database collecting latent periodicity regions in genomes. Tandem repeats and highly divergent tandem repeats along with the regions of a new type of periodicity, known as profile periodicity, have been collected for the genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster. We obtained data with the aid of a spectral-statistical approach to search for reliable latent periodicity regions (with periods up to 2000 bp) in DNA sequences. The original two-level mode of data presentation (a broad view of the region of latent periodicity and a second level indicating conservative fragments of its structure) was further developed to enable us to obtain the estimate, without redundancy, that latent periodicity regions make up ∼10% of the analyzed genomes. Analysis of the quantitative and qualitative content of located periodicity regions on all chromosomes of the analyzed organisms revealed dominant characteristic types of periodicity in the genomes. The pattern of density distribution of latent periodicity regions on chromosome unambiguously characterizes each chromosome in genome. Database URL:http://www.jcbi.ru/lp_baze/
Collapse
Affiliation(s)
- Maria Chaley
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Vladimir Kutyrkin
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Gayane Tulbasheva
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Elena Teplukhina
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Nafisa Nazipova
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| |
Collapse
|
86
|
Pugacheva V, Frenkel F, Korotkov E. Investigation of phase shifts for different period lengths in the genomes of C. elegans, D. melanogaster and S. cerevisiae. Comput Biol Chem 2014; 51:12-21. [PMID: 24840641 DOI: 10.1016/j.compbiolchem.2014.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Revised: 03/31/2014] [Accepted: 03/31/2014] [Indexed: 11/26/2022]
Abstract
We describe a new mathematical method for finding very diverged short tandem repeats containing a single indel. The method involves comparison of two frequency matrices: a first matrix for a subsequence before shift and a second one for a subsequence after it. A measure of comparison is based on matrix similarity. The approach developed was applied to analysis of the genomes of Caenorhabditis elegans, Drosophila melanogaster and Saccharomyces cerevisiae. They were investigated regarding the presence of tandem repeats having repeat length equal to 2 - 11 nucleotides except equal to 3, 6 and 9 nucleotides. A number of phase shift regions for these genomes was approximately 2.2 × 10(4), 1.5 × 10(4) and 1.7 × 10(2), respectively. Type I error was less than 5%. The mean length of fuzzy periodicity and phase shift regions was about 220 nucleotides. The regions of fuzzy periodicity having single insertion or deletion occupy substantial parts of the genomes: 5%, 3% and 0.3%, respectively. Only less than 10% of these regions have been detected previously. That is, the number of such regions in the genomes of C. elegans, D. melanogaster and S. cerevisiae is dramatically higher than it has been revealed by any known methods. We suppose that some found regions of fuzzy periodicity could be the regions for protein binding.
Collapse
Affiliation(s)
| | - Felix Frenkel
- Bioengineering Centre of Russian Academy of Science, Moscow 117312, Russia
| | - Eugene Korotkov
- Bioengineering Centre of Russian Academy of Science, Moscow 117312, Russia; National Research Nuclear University "MEPhI", Moscow 115409, Russia
| |
Collapse
|
87
|
Canales J, Bautista R, Label P, Gómez-Maldonado J, Lesur I, Fernández-Pozo N, Rueda-López M, Guerrero-Fernández D, Castro-Rodríguez V, Benzekri H, Cañas RA, Guevara MA, Rodrigues A, Seoane P, Teyssier C, Morel A, Ehrenmann F, Le Provost G, Lalanne C, Noirot C, Klopp C, Reymond I, García-Gutiérrez A, Trontin JF, Lelu-Walter MA, Miguel C, Cervera MT, Cantón FR, Plomion C, Harvengt L, Avila C, Gonzalo Claros M, Cánovas FM. De novo assembly of maritime pine transcriptome: implications for forest breeding and biotechnology. PLANT BIOTECHNOLOGY JOURNAL 2014; 12:286-99. [PMID: 24256179 DOI: 10.1111/pbi.12136] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2013] [Revised: 09/24/2013] [Accepted: 09/26/2013] [Indexed: 05/21/2023]
Abstract
Maritime pine (Pinus pinasterAit.) is a widely distributed conifer species in Southwestern Europe and one of the most advanced models for conifer research. In the current work, comprehensive characterization of the maritime pine transcriptome was performed using a combination of two different next-generation sequencing platforms, 454 and Illumina. De novo assembly of the transcriptome provided a catalogue of 26 020 unique transcripts in maritime pine trees and a collection of 9641 full-length cDNAs. Quality of the transcriptome assembly was validated by RT-PCR amplification of selected transcripts for structural and regulatory genes. Transcription factors and enzyme-encoding transcripts were annotated. Furthermore, the available sequencing data permitted the identification of polymorphisms and the establishment of robust single nucleotide polymorphism (SNP) and simple-sequence repeat (SSR) databases for genotyping applications and integration of translational genomics in maritime pine breeding programmes. All our data are freely available at SustainpineDB, the P. pinaster expressional database. Results reported here on the maritime pine transcriptome represent a valuable resource for future basic and applied studies on this ecological and economically important pine species.
Collapse
Affiliation(s)
- Javier Canales
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Málaga, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
88
|
Nánási M, Vinař T, Brejová B. Probabilistic approaches to alignment with tandem repeats. Algorithms Mol Biol 2014; 9:3. [PMID: 24580741 PMCID: PMC3975930 DOI: 10.1186/1748-7188-9-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 02/24/2014] [Indexed: 11/16/2022] Open
Abstract
Background Short tandem repeats are ubiquitous in genomic sequences and due to their complex evolutionary history pose a challenge for sequence alignment tools. Results To better account for the presence of tandem repeats in pairwise sequence alignments, we propose a simple tractable pair hidden Markov model that explicitly models their presence. Using the framework of gain functions, we design several optimization criteria for decoding this model and describe resulting decoding algorithms, ranging from the traditional Viterbi and posterior decoding to block-based decoding algorithms tailored to our model. We compare the accuracy of individual decoding algorithms on simulated and real data and find that our approach is superior to the classical three-state pair HMM. Conclusions Our study illustrates versatility of pair hidden Markov models coupled with appropriate decoding criteria as a modeling tool for capturing complex sequence features.
Collapse
|
89
|
Elsik CG, Worley KC, Bennett AK, Beye M, Camara F, Childers CP, de Graaf DC, Debyser G, Deng J, Devreese B, Elhaik E, Evans JD, Foster LJ, Graur D, Guigo R, Hoff KJ, Holder ME, Hudson ME, Hunt GJ, Jiang H, Joshi V, Khetani RS, Kosarev P, Kovar CL, Ma J, Maleszka R, Moritz RFA, Munoz-Torres MC, Murphy TD, Muzny DM, Newsham IF, Reese JT, Robertson HM, Robinson GE, Rueppell O, Solovyev V, Stanke M, Stolle E, Tsuruda JM, Vaerenbergh MV, Waterhouse RM, Weaver DB, Whitfield CW, Wu Y, Zdobnov EM, Zhang L, Zhu D, Gibbs RA. Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics 2014; 15:86. [PMID: 24479613 PMCID: PMC4028053 DOI: 10.1186/1471-2164-15-86] [Citation(s) in RCA: 280] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 01/27/2014] [Indexed: 11/21/2022] Open
Abstract
Background The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.
Collapse
Affiliation(s)
- Christine G Elsik
- Division of Animal Sciences, Division of Plant Sciences, and MU Informatics Institute, University of Missouri, Columbia, MO 65211, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
90
|
LaRue BL, Lagacé R, Chang CW, Holt A, Hennessy L, Ge J, King JL, Chakraborty R, Budowle B. Characterization of 114 insertion/deletion (INDEL) polymorphisms, and selection for a global INDEL panel for human identification. Leg Med (Tokyo) 2014; 16:26-32. [DOI: 10.1016/j.legalmed.2013.10.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Revised: 08/19/2013] [Accepted: 10/22/2013] [Indexed: 11/15/2022]
|
91
|
Huang XC, Rong J, Liu Y, Zhang MH, Wan Y, Ouyang S, Zhou CH, Wu XP. The complete maternally and paternally inherited mitochondrial genomes of the endangered freshwater mussel Solenaia carinatus (Bivalvia: Unionidae) and implications for Unionidae taxonomy. PLoS One 2013; 8:e84352. [PMID: 24358356 PMCID: PMC3866145 DOI: 10.1371/journal.pone.0084352] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Accepted: 11/14/2013] [Indexed: 11/30/2022] Open
Abstract
Doubly uniparental inheritance (DUI) is an exception to the typical maternal inheritance of mitochondrial (mt) DNA in Metazoa, and found only in some bivalves. In species with DUI, there are two highly divergent gender-associated mt genomes: maternal (F) and paternal (M), which transmit independently and show different tissue localization. Solenaia carinatus is an endangered freshwater mussel species exclusive to Poyang Lake basin, China. Anthropogenic events in the watershed greatly threaten the survival of this species. Nevertheless, the taxonomy of S. carinatus based on shell morphology is confusing, and the subfamilial placement of the genus Solenaia remains unclear. In order to clarify the taxonomic status and discuss the phylogenetic implications of family Unionidae, the entire F and M mt genomes of S. carinatus were sequenced and compared with the mt genomes of diverse freshwater mussel species. The complete F and M mt genomes of S. carinatus are 16716 bp and 17102 bp in size, respectively. The F and M mt genomes of S. carinatus diverge by about 40% in nucleotide sequence and 48% in amino acid sequence. Compared to F counterparts, the M genome shows a more compact structure. Different gene arrangements are found in these two gender-associated mt genomes. Among these, the F genome cox2-rrnS gene order is considered to be a genome-level synapomorphy for female lineage of the subfamily Gonideinae. From maternal and paternal mtDNA perspectives, the phylogenetic analyses of Unionoida indicate that S. carinatus belongs to Gonideinae. The F and M clades in freshwater mussels are reciprocal monophyly. The phylogenetic trees advocate the classification of sampled Unionidae species into four subfamilies: Gonideinae, Ambleminae, Anodontinae, and Unioninae, which is supported by the morphological characteristics of glochidia.
Collapse
Affiliation(s)
- Xiao-Chen Huang
- Center for Watershed Ecology, Institute of Life Science, Nanchang University, Nanchang, P. R. China
- School of Life Sciences and Food Engineering, Nanchang University, Nanchang, P. R. China
| | - Jun Rong
- Center for Watershed Ecology, Institute of Life Science, Nanchang University, Nanchang, P. R. China
| | - Yong Liu
- School of Life Sciences and Food Engineering, Nanchang University, Nanchang, P. R. China
| | - Ming-Hua Zhang
- School of Life Sciences and Food Engineering, Nanchang University, Nanchang, P. R. China
| | - Yuan Wan
- Center for Watershed Ecology, Institute of Life Science, Nanchang University, Nanchang, P. R. China
- School of Life Sciences and Food Engineering, Nanchang University, Nanchang, P. R. China
| | - Shan Ouyang
- School of Life Sciences and Food Engineering, Nanchang University, Nanchang, P. R. China
| | - Chun-Hua Zhou
- Center for Watershed Ecology, Institute of Life Science, Nanchang University, Nanchang, P. R. China
- School of Life Sciences and Food Engineering, Nanchang University, Nanchang, P. R. China
| | - Xiao-Ping Wu
- Center for Watershed Ecology, Institute of Life Science, Nanchang University, Nanchang, P. R. China
- School of Life Sciences and Food Engineering, Nanchang University, Nanchang, P. R. China
| |
Collapse
|
92
|
Rico C, Normandeau E, Dion-Côté AM, Rico MI, Côté G, Bernatchez L. Combining next-generation sequencing and online databases for microsatellite development in non-model organisms. Sci Rep 2013; 3:3376. [PMID: 24296905 PMCID: PMC3847856 DOI: 10.1038/srep03376] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Accepted: 10/30/2013] [Indexed: 12/02/2022] Open
Abstract
Next-generation sequencing (NGS) is revolutionising marker development and the rapidly increasing amount of transcriptomes published across a wide variety of taxa is providing valuable sequence databases for the identification of genetic markers without the need to generate new sequences. Microsatellites are still the most important source of polymorphic markers in ecology and evolution. Motivated by our long-term interest in the adaptive radiation of a non-model species complex of whitefishes (Coregonus spp.), in this study, we focus on microsatellite characterisation and multiplex optimisation using transcriptome sequences generated by Illumina® and Roche-454, as well as online databases of Expressed Sequence Tags (EST) for the study of whitefish evolution and demographic history. We identified and optimised 40 polymorphic loci in multiplex PCR reactions and validated the robustness of our analyses by testing several population genetics and phylogeographic predictions using 494 fish from five lakes and 2 distinct ecotypes.
Collapse
Affiliation(s)
- Ciro Rico
- 1] Estación Biológica de Doñana, Consejo Superior de Investigaciones Científicas (EBD, CSIC), C/Américo Vespucio s/n, 41092 Sevilla, Spain [2] School of Marine Studies, University of the South Pacific, Lower Laucala Campus, Suva, Fiji Islands [3] Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie, Université Laval, Pavillon Charles-Eugène-Marchand, Québec G1V 0A6, Canada
| | | | | | | | | | | |
Collapse
|
93
|
Graves CJ, Ros VID, Stevenson B, Sniegowski PD, Brisson D. Natural selection promotes antigenic evolvability. PLoS Pathog 2013; 9:e1003766. [PMID: 24244173 PMCID: PMC3828179 DOI: 10.1371/journal.ppat.1003766] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Accepted: 09/30/2013] [Indexed: 01/16/2023] Open
Abstract
The hypothesis that evolvability - the capacity to evolve by natural selection - is itself the object of natural selection is highly intriguing but remains controversial due in large part to a paucity of direct experimental evidence. The antigenic variation mechanisms of microbial pathogens provide an experimentally tractable system to test whether natural selection has favored mechanisms that increase evolvability. Many antigenic variation systems consist of paralogous unexpressed 'cassettes' that recombine into an expression site to rapidly alter the expressed protein. Importantly, the magnitude of antigenic change is a function of the genetic diversity among the unexpressed cassettes. Thus, evidence that selection favors among-cassette diversity is direct evidence that natural selection promotes antigenic evolvability. We used the Lyme disease bacterium, Borrelia burgdorferi, as a model to test the prediction that natural selection favors amino acid diversity among unexpressed vls cassettes and thereby promotes evolvability in a primary surface antigen, VlsE. The hypothesis that diversity among vls cassettes is favored by natural selection was supported in each B. burgdorferi strain analyzed using both classical (dN/dS ratios) and Bayesian population genetic analyses of genetic sequence data. This hypothesis was also supported by the conservation of highly mutable tandem-repeat structures across B. burgdorferi strains despite a near complete absence of sequence conservation. Diversification among vls cassettes due to natural selection and mutable repeat structures promotes long-term antigenic evolvability of VlsE. These findings provide a direct demonstration that molecular mechanisms that enhance evolvability of surface antigens are an evolutionary adaptation. The molecular evolutionary processes identified here can serve as a model for the evolution of antigenic evolvability in many pathogens which utilize similar strategies to establish chronic infections.
Collapse
Affiliation(s)
| | - Vera I. D. Ros
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Brian Stevenson
- University of Kentucky, Lexington, Kentucky, United States of America
| | - Paul D. Sniegowski
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Dustin Brisson
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
94
|
Doi K, Monjo T, Hoang PH, Yoshimura J, Yurino H, Mitsui J, Ishiura H, Takahashi Y, Ichikawa Y, Goto J, Tsuji S, Morishita S. Rapid detection of expanded short tandem repeats in personal genomics using hybrid sequencing. ACTA ACUST UNITED AC 2013; 30:815-22. [PMID: 24215022 PMCID: PMC3957077 DOI: 10.1093/bioinformatics/btt647] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Motivation: Long expansions of short tandem repeats (STRs), i.e. DNA repeats of 2–6 nt, are associated with some genetic diseases. Cost-efficient high-throughput sequencing can quickly produce billions of short reads that would be useful for uncovering disease-associated STRs. However, enumerating STRs in short reads remains largely unexplored because of the difficulty in elucidating STRs much longer than 100 bp, the typical length of short reads. Results: We propose ab initio procedures for sensing and locating long STRs promptly by using the frequency distribution of all STRs and paired-end read information. We validated the reproducibility of this method using biological replicates and used it to locate an STR associated with a brain disease (SCA31). Subsequently, we sequenced this STR site in 11 SCA31 samples using SMRTTM sequencing (Pacific Biosciences), determined 2.3–3.1 kb sequences at nucleotide resolution and revealed that (TGGAA)- and (TAAAATAGAA)-repeat expansions determined the instability of the repeat expansions associated with SCA31. Our method could also identify common STRs, (AAAG)- and (AAAAG)-repeat expansions, which are remarkably expanded at four positions in an SCA31 sample. This is the first proposed method for rapidly finding disease-associated long STRs in personal genomes using hybrid sequencing of short and long reads. Availability and implementation: Our TRhist software is available at http://trhist.gi.k.u-tokyo.ac.jp/. Contact:moris@cb.k.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Koichiro Doi
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Department of Information and Communication Engineering, Faculty of Engineering and Department of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo 113-8655, Japan
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
95
|
Parkinson N, Bryant R, Bew J, Conyers C, Stones R, Alcock M, Elphinstone J. Application of variable-number tandem-repeat typing to discriminate Ralstonia solanacearum strains associated with English watercourses and disease outbreaks. Appl Environ Microbiol 2013; 79:6016-22. [PMID: 23892739 PMCID: PMC3811358 DOI: 10.1128/aem.01219-13] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 07/17/2013] [Indexed: 11/20/2022] Open
Abstract
Variable-number tandem-repeat (VNTR) analysis was used for high-resolution discrimination among Ralstonia solanacearum phylotype IIB sequevar 1 (PIIB-1) isolates and further evaluated for use in source tracing. Five tandem-repeat-containing loci (comprising six tandem repeats) discriminated 17 different VNTR profiles among 75 isolates from potato, geranium, bittersweet (Solanum dulcamara), tomato, and the environment. R. solanacearum isolates from crops at three unrelated outbreak sites where river water had been used for irrigation had distinct VNTR profiles that were shared with PIIB-1 isolates from infected bittersweet growing upriver of each site. The VNTR profiling results supported the implication that the source of R. solanacearum at each outbreak was contaminated river water. Analysis of 51 isolates from bittersweet growing in river water at different locations provided a means to evaluate the technique for studying the epidemiology of the pathogen in the environment. Ten different VNTR profiles were identified among bittersweet PIIB-1 isolates from the River Thames. Repeated findings of contiguous river stretches that produced isolates that shared single VNTR profiles supported the hypothesis that the pathogen had disseminated from infected bittersweet plants located upriver. VNTR profiles shared between bittersweet isolates from two widely separated Thames tributaries (River Ray and River Colne) suggested they were independently contaminated with the same clonal type. Some bittersweet isolates had VNTR profiles that were shared with potato isolates collected outside the United Kingdom. It was concluded that VNTR profiling could contribute to further understanding of R. solanacearum epidemiology and assist in control of future disease outbreaks.
Collapse
Affiliation(s)
- Neil Parkinson
- Food and Environment Research Agency (FERA), Sand Hutton, York, United Kingdom
| | - Ruth Bryant
- John Innes Centre, Norwich Research Park, Norwich, United Kingdom
| | - Janice Bew
- Food and Environment Research Agency (FERA), Sand Hutton, York, United Kingdom
| | - Christine Conyers
- Food and Environment Research Agency (FERA), Sand Hutton, York, United Kingdom
| | - Robert Stones
- Food and Environment Research Agency (FERA), Sand Hutton, York, United Kingdom
| | - Michael Alcock
- Food and Environment Research Agency (FERA), Sand Hutton, York, United Kingdom
| | - John Elphinstone
- Food and Environment Research Agency (FERA), Sand Hutton, York, United Kingdom
| |
Collapse
|
96
|
Light S, Sagit R, Sachenkova O, Ekman D, Elofsson A. Protein Expansion Is Primarily due to Indels in Intrinsically Disordered Regions. Mol Biol Evol 2013; 30:2645-53. [DOI: 10.1093/molbev/mst157] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
97
|
Razzaghian HR, Forsberg LA, Prakash KR, Przerada S, Paprocka H, Zywicka A, Westerman MP, Pedersen NL, O'Hanlon TP, Rider LG, Miller FW, Srutek E, Jankowski M, Zegarski W, Piotrowski A, Absher D, Dumanski JP. Post-zygotic and inter-individual structural genetic variation in a presumptive enhancer element of the locus between the IL10Rβ and IFNAR1 genes. PLoS One 2013; 8:e67752. [PMID: 24023707 PMCID: PMC3762855 DOI: 10.1371/journal.pone.0067752] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 05/21/2013] [Indexed: 12/26/2022] Open
Abstract
Although historically considered as junk-DNA, tandemly repeated sequence motifs can affect human phenotype. For example, variable number tandem repeats (VNTR) with embedded enhancers have been shown to regulate gene transcription. The post-zygotic variation is the presence of genetically distinct populations of cells in an individual derived from a single zygote, and this is an understudied aspect of genome biology. We report somatically variable VNTR with sequence properties of an enhancer, located upstream of IFNAR1. Initially, SNP genotyping of 63 monozygotic twin pairs and multiple tissues from 21 breast cancer patients suggested a frequent post-zygotic mosaicism. The VNTR displayed a repeated 32 bp core motif in the center of the repeat, which was flanked by similar variable motifs. A total of 14 alleles were characterized based on combinations of segments, which showed post-zygotic and inter-individual variation, with up to 6 alleles in a single subject. Somatic variation occurred in ∼24% of cases. In this hypervariable region, we found a clustering of transcription factor binding sites with strongest sequence similarity to mouse Foxg1 transcription factor binding motif. This study describes a VNTR with sequence properties of an enhancer that displays post-zygotic and inter-individual genetic variation. This element is within a locus containing four related cytokine receptors: IFNAR2, IL10Rβ, IFNAR1 and IFNGR2, and we hypothesize that it might function in transcriptional regulation of several genes in this cluster. Our findings add another level of complexity to the variation among VNTR-based enhancers. Further work may unveil the normal function of this VNTR in transcriptional control and its possible involvement in diseases connected with these receptors, such as autoimmune conditions and cancer.
Collapse
Affiliation(s)
- Hamid Reza Razzaghian
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Lars A. Forsberg
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | | | - Szymon Przerada
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Hanna Paprocka
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Anna Zywicka
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Maxwell P. Westerman
- Hematology Research, Mount Sinai Hospital Medical Center, Chicago, Illinois, United States of America
| | - Nancy L. Pedersen
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Terrance P. O'Hanlon
- Environmental Autoimmunity Group, National Institute of Environmental Health Sciences, National Institutes of Health Clinical Research Center, Bethesda, Maryland, United States of America
| | - Lisa G. Rider
- Environmental Autoimmunity Group, National Institute of Environmental Health Sciences, National Institutes of Health Clinical Research Center, Bethesda, Maryland, United States of America
| | - Frederick W. Miller
- Environmental Autoimmunity Group, National Institute of Environmental Health Sciences, National Institutes of Health Clinical Research Center, Bethesda, Maryland, United States of America
| | - Ewa Srutek
- Surgical Oncology Clinic, Collegium Medicum, Oncology Center, Nicolaus Copernicus University, Bydgoszcz, Poland
| | - Michal Jankowski
- Surgical Oncology Clinic, Collegium Medicum, Oncology Center, Nicolaus Copernicus University, Bydgoszcz, Poland
| | - Wojciech Zegarski
- Surgical Oncology Clinic, Collegium Medicum, Oncology Center, Nicolaus Copernicus University, Bydgoszcz, Poland
| | - Arkadiusz Piotrowski
- Department of Biology and Pharmaceutical Botany, Medical University of Gdansk, Gdansk, Poland
| | - Devin Absher
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America
| | - Jan P. Dumanski
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
98
|
Ernest HB, Well JA, Kurushima JD. Development of 10 microsatellite loci for Yellow-billed Magpies (Pica nuttalli) and corvid ecology and West Nile virus studies. Mol Ecol Resour 2013; 8:196-8. [PMID: 21585754 DOI: 10.1111/j.1471-8286.2007.01921.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
We developed 10 polymorphic microsatellite loci for Yellow-billed Magpies (Pica nuttalli). The primers were tested across a population of 57 Central California Yellow-billed Magpies and displayed an average of 3.9 alleles per locus. Forty-one American Crows (Corvus brachyrhynchos) from California were polymorphic for seven of the loci with an average of 2.9 alleles per locus. One additional microsatellite-containing locus displayed diagnostic allele sizes and may be useful to distinguish between the two species. These corvid specific microsatellites will aid ecological studies of the population-level effects of diseases, such as West Nile virus.
Collapse
Affiliation(s)
- Holly B Ernest
- Wildlife and Ecology Unit, Veterinary Genetics Laboratory, School of Veterinary Medicine, and Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, One Shields Avenue, Davis, CA 95616, USA
| | | | | |
Collapse
|
99
|
Isolation and characterization of microsatellite DNA markers in the Greater Roadrunner (Geococcyx californianus). CONSERV GENET RESOUR 2013. [DOI: 10.1007/s12686-012-9793-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
100
|
de Ridder C, Kourie D, Watson B, Fourie T, Reyneke P. Fine-tuning the search for microsatellites. ACTA ACUST UNITED AC 2013. [DOI: 10.1016/j.jda.2012.12.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|