1
|
Li X, Li H, Yang Z, Wang L. Distribution rules of 8-mer spectra and characterization of evolution state in animal genome sequences. BMC Genomics 2024; 25:855. [PMID: 39266973 PMCID: PMC11391722 DOI: 10.1186/s12864-024-10786-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 09/09/2024] [Indexed: 09/14/2024] Open
Abstract
BACKGROUND Studying the composition rules and evolution mechanisms of genome sequences are core issues in the post-genomic era, and k-mer spectrum analysis of genome sequences is an effective means to solve this problem. RESULT We divided total 8-mers of genome sequences into 16 kinds of XY-type due to XY dinucleotides number in 8-mers. Previous works explored that the independent unimodal distributions observed only in three CG-type 8-mer spectra, while non-CG type 8-mer spectra have not the universal phenomenon from prokaryotes to eukaryotes. On this basis, we analyzed the distribution variation of non-CG type 8-mer spectra across 889 animal genome sequences. Following the evolutionary order of animals from primitive to more complex, we found that the spectrum distributions gradually transition from unimodal to tri-modal. The relative distance from the average frequency of each non-CG type 8-mers to the center frequency is different within a species and among different species. For the 8-mers contain CG dinucleotides, we further divided these into 16 subsets, where each 8-mer contains both CG and XY dinucleotides, called XY1_CG1 subsets. We found that the separability values of XY1_CG1 spectra are closely related to the evolution and specificity of animals. Considering the constraint of Chargaff's second parity rule, we finally obtained 10 separability values as the feature set to characterize the evolution state of genome sequences. In order to verify the rationality of the feature set, we used 14 common classification algorithms to perform binary classification tests. The results showed that the accuracy (Acc) ranged between 98.70% and 83.88% among birds, other vertebrates and mammals. CONCLUSION We proposed a credible feature set to characterizes the evolution state of genomes and obtained satisfied results by the feature set on large scale classification of animals.
Collapse
Affiliation(s)
- Xiaolong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Hong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China.
| | - Zhenhua Yang
- School of Economics and Management, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Lu Wang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| |
Collapse
|
2
|
Molitor C, Kurowski TJ, Fidalgo de Almeida PM, Kevei Z, Spindlow DJ, Chacko Kaitholil SR, Iheanyichi JU, Prasanna HC, Thompson AJ, Mohareb FR. A chromosome-level genome assembly of Solanum chilense, a tomato wild relative associated with resistance to salinity and drought. FRONTIERS IN PLANT SCIENCE 2024; 15:1342739. [PMID: 38525148 PMCID: PMC10957597 DOI: 10.3389/fpls.2024.1342739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/12/2024] [Indexed: 03/26/2024]
Abstract
Introduction Solanum chilense is a wild relative of tomato reported to exhibit resistance to biotic and abiotic stresses. There is potential to improve tomato cultivars via breeding with wild relatives, a process greatly accelerated by suitable genomic and genetic resources. Methods In this study we generated a high-quality, chromosome-level, de novo assembly for the S. chilense accession LA1972 using a hybrid assembly strategy with ~180 Gbp of Illumina short reads and ~50 Gbp long PacBio reads. Further scaffolding was performed using Bionano optical maps and 10x Chromium reads. Results The resulting sequences were arranged into 12 pseudomolecules using Hi-C sequencing. This resulted in a 901 Mbp assembly, with a completeness of 95%, as determined by Benchmarking with Universal Single-Copy Orthologs (BUSCO). Sequencing of RNA from multiple tissues resulting in ~219 Gbp of reads was used to annotate the genome assembly with an RNA-Seq guided gene prediction, and for a de novo transcriptome assembly. This chromosome-level, high-quality reference genome for S. chilense accession LA1972 will support future breeding efforts for more sustainable tomato production. Discussion Gene sequences related to drought and salt resistance were compared between S. chilense and S. lycopersicum to identify amino acid variations with high potential for functional impact. These variants were subsequently analysed in 84 resequenced tomato lines across 12 different related species to explore the variant distributions. We identified a set of 7 putative impactful amino acid variants some of which may also impact on fruit development for example the ethylene-responsive transcription factor WIN1 and ethylene-insensitive protein 2. These variants could be tested for their ability to confer functional phenotypes to cultivars that have lost these variants.
Collapse
Affiliation(s)
- Corentin Molitor
- The Bioinformatics Group, School of Water, Energy and Environment, Cranfield University, Wharley End, United Kingdom
| | - Tomasz J. Kurowski
- The Bioinformatics Group, School of Water, Energy and Environment, Cranfield University, Wharley End, United Kingdom
| | | | - Zoltan Kevei
- Soil, Agrifood and Biosciences, Cranfield University, Wharley End, United Kingdom
| | - Daniel J. Spindlow
- The Bioinformatics Group, School of Water, Energy and Environment, Cranfield University, Wharley End, United Kingdom
| | - Steffimol R. Chacko Kaitholil
- The Bioinformatics Group, School of Water, Energy and Environment, Cranfield University, Wharley End, United Kingdom
| | - Justice U. Iheanyichi
- The Bioinformatics Group, School of Water, Energy and Environment, Cranfield University, Wharley End, United Kingdom
| | - H. C. Prasanna
- Division of Vegetable Crops, ICAR-Indian Institute of Horticultural Research, Bangalore, India
| | - Andrew J. Thompson
- Soil, Agrifood and Biosciences, Cranfield University, Wharley End, United Kingdom
| | - Fady R. Mohareb
- The Bioinformatics Group, School of Water, Energy and Environment, Cranfield University, Wharley End, United Kingdom
| |
Collapse
|
3
|
Chancharoenthana W, Kamolratanakul S, Schultz MJ, Leelahavanichkul A. The leaky gut and the gut microbiome in sepsis - targets in research and treatment. Clin Sci (Lond) 2023; 137:645-662. [PMID: 37083032 PMCID: PMC10133873 DOI: 10.1042/cs20220777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/25/2023] [Accepted: 04/05/2023] [Indexed: 04/22/2023]
Abstract
Both a leaky gut (a barrier defect of the intestinal surface) and gut dysbiosis (a change in the intestinal microbial population) are intrinsic to sepsis. While sepsis itself can cause dysbiosis, dysbiosis can worsen sepsis. The leaky gut syndrome refers to a status with which there is an increased intestinal permeability allowing the translocation of microbial molecules from the gut into the blood circulation. It is not just a symptom of gastrointestinal involvement, but also an underlying cause that develops independently, and its presence could be recognized by the detection, in blood, of lipopolysaccharides and (1→3)-β-D-glucan (major components of gut microbiota). Gut-dysbiosis is the consequence of a reduction in some bacterial species in the gut microbiome, as a consequence of intestinal mucosal immunity defect, caused by intestinal hypoperfusion, immune cell apoptosis, and a variety of enteric neuro-humoral-immunity responses. A reduction in bacteria that produce short-chain fatty acids could change the intestinal barriers, leading to the translocation of pathogen molecules, into the circulation where it causes systemic inflammation. Even gut fungi might be increased in human patients with sepsis, even though this has not been consistently observed in murine models of sepsis, probably because of the longer duration of sepsis and also antibiotic use in patients. The gut virobiome that partly consists of bacteriophages is also detectable in gut contents that might be different between sepsis and normal hosts. These alterations of gut dysbiosis altogether could be an interesting target for sepsis adjuvant therapies, e.g., by faecal transplantation or probiotic therapy. Here, current information on leaky gut and gut dysbiosis along with the potential biomarkers, new treatment strategies, and future research topics are mentioned.
Collapse
Affiliation(s)
- Wiwat Chancharoenthana
- Department of Clinical Tropical Medicine, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
- Tropical Immunology and Translational Research Unit (TITRU), Department of Clinical Tropical Medicine, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Supitcha Kamolratanakul
- Department of Clinical Tropical Medicine, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
- Tropical Immunology and Translational Research Unit (TITRU), Department of Clinical Tropical Medicine, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Marcus J Schultz
- Department of Intensive Care and Laboratory of Experimental Intensive Care and Anesthesiology (L.E.I.C.A), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, Oxford University, Oxford, United Kingdom
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Asada Leelahavanichkul
- Department of Microbiology, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Center of Excellence on Translational Research in Inflammation and Immunology (CETRII), Department of Microbiology, Chulalongkorn University, Bangkok 10330, Thailand
| |
Collapse
|
4
|
Hesse U. K-Mer-Based Genome Size Estimation in Theory and Practice. Methods Mol Biol 2023; 2672:79-113. [PMID: 37335470 DOI: 10.1007/978-1-0716-3226-0_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
Recent advances in sequencing technologies have made genome sequencing of non-model organisms with very large and complex genomes possible. The data can be used to estimate diverse genome characteristics, including genome size, repeat content, and levels of heterozygosity. K-mer analysis is a powerful biocomputational approach with a wide range of applications, including estimation of genome sizes. However, interpretation of the results is not always straightforward. Here, I review k-mer-based genome size estimation, focusing specifically on k-mer theory and peak calling in k-mer frequency histograms. I highlight common pitfalls in data analysis and result interpretation, and provide a comprehensive overview on current methods and programs developed to conduct these analyses.
Collapse
Affiliation(s)
- Uljana Hesse
- Department of Biotechnology, University of the Western Cape, Bellville, South Africa.
| |
Collapse
|
5
|
Yamashiro T, Shiraishi A, Nakayama K, Satake H. Draft Genome of Tanacetum Coccineum: Genomic Comparison of Closely Related Tanacetum-Family Plants. Int J Mol Sci 2022; 23:7039. [PMID: 35806039 PMCID: PMC9267051 DOI: 10.3390/ijms23137039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/10/2022] [Accepted: 06/22/2022] [Indexed: 11/16/2022] Open
Abstract
The plant Tanacetum coccineum (painted daisy) is closely related to Tanacetum cinerariifolium (pyrethrum daisy). However, T. cinerariifolium produces large amounts of pyrethrins, a class of natural insecticides, whereas T. coccineum produces much smaller amounts of these compounds. Thus, comparative genomic analysis is expected to contribute a great deal to investigating the differences in biological defense systems, including pyrethrin biosynthesis. Here, we elucidated the 9.4 Gb draft genome of T. coccineum, consisting of 2,836,647 scaffolds and 103,680 genes. Comparative analyses of the draft genome of T. coccineum and that of T. cinerariifolium, generated in our previous study, revealed distinct features of T. coccineum genes. While the T. coccineum genome contains more numerous ribosome-inactivating protein (RIP)-encoding genes, the number of higher-toxicity type-II RIP-encoding genes is larger in T. cinerariifolium. Furthermore, the number of histidine kinases encoded by the T. coccineum genome is smaller than that of T. cinerariifolium, suggesting a biological correlation with pyrethrin biosynthesis. Moreover, the flanking regions of pyrethrin biosynthesis-related genes are also distinct between these two plants. These results provide clues to the elucidation of species-specific biodefense systems, including the regulatory mechanisms underlying pyrethrin production.
Collapse
Affiliation(s)
- Takanori Yamashiro
- Dainihon Jochugiku Co., Ltd., 1-1-11 Daikoku-cho, Toyonaka, Osaka 561-0827, Japan; (T.Y.); (K.N.)
- Department of Chemical Science and Engineering, Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, Hyogo 657-8501, Japan
| | - Akira Shiraishi
- Bioorganic Research Institute, Suntory Foundation for Life Sciences, 8-1-1 Seikadai, Seika-cho, Souraku, Kyoto 619-0284, Japan;
| | - Koji Nakayama
- Dainihon Jochugiku Co., Ltd., 1-1-11 Daikoku-cho, Toyonaka, Osaka 561-0827, Japan; (T.Y.); (K.N.)
| | - Honoo Satake
- Department of Chemical Science and Engineering, Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, Hyogo 657-8501, Japan
- Bioorganic Research Institute, Suntory Foundation for Life Sciences, 8-1-1 Seikadai, Seika-cho, Souraku, Kyoto 619-0284, Japan;
| |
Collapse
|
6
|
Neutrophil Extracellular Traps in Severe SARS-CoV-2 Infection: A Possible Impact of LPS and (1→3)-β-D-glucan in Blood from Gut Translocation. Cells 2022; 11:cells11071103. [PMID: 35406667 PMCID: PMC8997739 DOI: 10.3390/cells11071103] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 03/15/2022] [Accepted: 03/22/2022] [Indexed: 02/01/2023] Open
Abstract
Due to limited data on the link between gut barrier defects (leaky gut) and neutrophil extracellular traps (NETs) in coronavirus disease 2019 (COVID-19), blood samples of COVID-19 cases—mild (upper respiratory tract symptoms without pneumonia; n = 27), moderate (pneumonia without hypoxia; n = 28), and severe (pneumonia with hypoxia; n = 20)—versus healthy control (n = 15) were evaluated, together with in vitro experiments. Accordingly, neutrophil counts, serum cytokines (IL-6 and IL-8), lipopolysaccharide (LPS), bacteria-free DNA, and NETs parameters (fluorescent-stained nuclear morphology, dsDNA, neutrophil elastase, histone–DNA complex, and myeloperoxidase–DNA complex) were found to differentiate COVID-19 severity, whereas serum (1→3)-β-D-glucan (BG) was different between the control and COVID-19 cases. Despite non-detectable bacteria-free DNA in the blood of healthy volunteers, using blood bacteriome analysis, proteobacterial DNA was similarly predominant in both control and COVID-19 cases (all severities). In parallel, only COVID-19 samples from moderate and severe cases, but not mild cases, were activated in vitro NETs, as determined by supernatant dsDNA, Peptidyl Arginine Deiminase 4, and nuclear morphology. With neutrophil experiments, LPS plus BG (LPS + BG) more prominently induced NETs, cytokines, NFκB, and reactive oxygen species, when compared with the activation by each molecule alone. In conclusion, pathogen molecules (LPS and BG) from gut translocation along with neutrophilia and cytokinemia in COVID-19-activated, NETs-induced hyperinflammation.
Collapse
|
7
|
Blood Bacteria-Free DNA in Septic Mice Enhances LPS-Induced Inflammation in Mice through Macrophage Response. Int J Mol Sci 2022; 23:ijms23031907. [PMID: 35163830 PMCID: PMC8836862 DOI: 10.3390/ijms23031907] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 02/03/2022] [Accepted: 02/04/2022] [Indexed: 02/06/2023] Open
Abstract
Although bacteria-free DNA in blood during systemic infection is mainly derived from bacterial death, translocation of the DNA from the gut into the blood circulation (gut translocation) is also possible. Hence, several mouse models with experiments on macrophages were conducted to explore the sources, influences, and impacts of bacteria-free DNA in sepsis. First, bacteria-free DNA and bacteriome in blood were demonstrated in cecal ligation and puncture (CLP) sepsis mice. Second, administration of bacterial lysate (a source of bacterial DNA) in dextran sulfate solution (DSS)-induced mucositis mice elevated blood bacteria-free DNA without bacteremia supported gut translocation of free DNA. The absence of blood bacteria-free DNA in DSS mice without bacterial lysate implies an impact of the abundance of bacterial DNA in intestinal contents on the translocation of free DNA. Third, higher serum cytokines in mice after injection of combined bacterial DNA with lipopolysaccharide (LPS), when compared to LPS injection alone, supported an influence of blood bacteria-free DNA on systemic inflammation. The synergistic effects of free DNA and LPS on macrophage pro-inflammatory responses, as indicated by supernatant cytokines (TNF-α, IL-6, and IL-10), pro-inflammatory genes (NFκB, iNOS, and IL-1β), and profound energy alteration (enhanced glycolysis with reduced mitochondrial functions), which was neutralized by TLR-9 inhibition (chloroquine), were demonstrated. In conclusion, the presence of bacteria-free DNA in sepsis mice is partly due to gut translocation of bacteria-free DNA into the systemic circulation, which would enhance sepsis severity. Inhibition of the responses against bacterial DNA by TLR-9 inhibition could attenuate LPS-DNA synergy in macrophages and might help improve sepsis hyper-inflammation in some situations.
Collapse
|
8
|
Sarmashghi S, Balaban M, Rachtman E, Touri B, Mirarab S, Bafna V. Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT. PLoS Comput Biol 2021; 17:e1009449. [PMID: 34780468 PMCID: PMC8629397 DOI: 10.1371/journal.pcbi.1009449] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 11/29/2021] [Accepted: 09/13/2021] [Indexed: 01/26/2023] Open
Abstract
The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had 2.2% error in length estimation compared to 27% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_shahab-2Dsarmashghi_RESPECT.git&d=DwIGAw&c=-35OiAkTchMrZOngvJPOeA&r=ZozViWvD1E8PorCkfwYKYQMVKFoEcqLFm4Tg49XnPcA&m=f-xS8GMHKckknkc7Xpp8FJYw_ltUwz5frOw1a5pJ81EpdTOK8xhbYmrN4ZxniM96&s=717o8hLR1JmHFpRPSWG6xdUQTikyUjicjkipjFsKG4w&e=. The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome skims) could be transformative for genomic ecology. Analyzing genome skims, mostly based on statistics of small oligomers, remains challenging, but recent results have shown the advantage of this approach for the identification and phylogenetic placement of eukaryotic species. In this paper, we present a method, RESPECT, to estimate genomic properties such as genome length and repetitiveness from low-coverage genome skims. We trained RESPECT using assembled genomes and tested it on low-coverage simulated and real reads. Benchmarking results reveal that RESPECT has excellent accuracy in estimating the genome length compared to other methods, and can provide critical information regarding the repeat structure of the genome.
Collapse
Affiliation(s)
- Shahab Sarmashghi
- Department of Electrical & Computer Engineering, University of California, San Diego, La Jolla, California, United States of America
| | - Metin Balaban
- Bioinformatics & Systems Biology Graduate Program, University of California, San Diego, La Jolla, California, United States of America
| | - Eleonora Rachtman
- Bioinformatics & Systems Biology Graduate Program, University of California, San Diego, La Jolla, California, United States of America
| | - Behrouz Touri
- Department of Electrical & Computer Engineering, University of California, San Diego, La Jolla, California, United States of America
| | - Siavash Mirarab
- Department of Electrical & Computer Engineering, University of California, San Diego, La Jolla, California, United States of America
| | - Vineet Bafna
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
9
|
Smith SR, Normandeau E, Djambazian H, Nawarathna PM, Berube P, Muir AM, Ragoussis J, Penney CM, Scribner KT, Luikart G, Wilson CC, Bernatchez L. A chromosome-anchored genome assembly for Lake Trout (Salvelinus namaycush). Mol Ecol Resour 2021; 22:679-694. [PMID: 34351050 PMCID: PMC9291852 DOI: 10.1111/1755-0998.13483] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 07/25/2021] [Accepted: 07/28/2021] [Indexed: 01/23/2023]
Abstract
Here, we present an annotated, chromosome‐anchored, genome assembly for Lake Trout (Salvelinus namaycush) – a highly diverse salmonid species of notable conservation concern and an excellent model for research on adaptation and speciation. We leveraged Pacific Biosciences long‐read sequencing, paired‐end Illumina sequencing, proximity ligation (Hi‐C) sequencing, and a previously published linkage map to produce a highly contiguous assembly composed of 7378 contigs (contig N50 = 1.8 Mb) assigned to 4120 scaffolds (scaffold N50 = 44.975 Mb). Long read sequencing data were generated using DNA from a female double haploid individual. 84.7% of the genome was assigned to 42 chromosome‐sized scaffolds and 93.2% of Benchmarking Universal Single Copy Orthologues were recovered, putting this assembly on par with the best currently available salmonid genomes. Estimates of genome size based on k‐mer frequency analysis were highly similar to the total size of the finished genome, suggesting that the entirety of the genome was recovered. A mitochondrial genome assembly was also produced. Self‐versus‐self synteny analysis allowed us to identify homeologs resulting from the salmonid specific autotetraploid event (Ss4R) as well as regions exhibiting delayed rediploidization. Alignment with three other salmonid genomes and the Northern Pike (Esox lucius) genome also allowed us to identify homologous chromosomes in related taxa. We also generated multiple resources useful for future genomic research on Lake Trout, including a repeat library and a sex‐averaged recombination map. A novel RNA sequencing data set for liver tissue was also generated in order to produce a publicly available set of annotations for 49,668 genes and pseudogenes. Potential applications of these resources to population genetics and the conservation of native populations are discussed.
Collapse
Affiliation(s)
- Seth R Smith
- Department of Integrative Biology, Michigan State University, East Lansing, MI, USA.,Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, USA
| | - Eric Normandeau
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, QC, Canada
| | - Haig Djambazian
- McGill Genome Centre, Department of Human Genetics, Montreal, QC, Canada
| | - Pubudu M Nawarathna
- Department of Human Genetics, Canadian Centre for Computational Genomics (C3G, McGill University, Montréal, QC, Canada
| | - Pierre Berube
- McGill Genome Centre, Department of Human Genetics, Montreal, QC, Canada
| | | | - Jiannis Ragoussis
- McGill Genome Centre, Department of Human Genetics, Montreal, QC, Canada
| | - Chantelle M Penney
- Environmental and Life Sciences Graduate Program, Trent University, Peterborough, ON, Canada
| | - Kim T Scribner
- Department of Integrative Biology, Michigan State University, East Lansing, MI, USA.,Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA
| | - Gordon Luikart
- Fish and Wildlife Genomics Group, University of Montana, Missoula, MT, USA.,Flathead Lake Biological Station, Division of Biological Sciences, University of Montana, Polson, MT, USA
| | - Chris C Wilson
- Aquatic Research and Monitoring Section, Ontario Ministry of Natural Resources and Forestry, Peterborough, ON, Canada
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, QC, Canada
| |
Collapse
|
10
|
Estimation of Genome Size in the Endemic Species Reseda pentagyna and the Locally Rare Species Reseda lutea Using comparative Analyses of Flow Cytometry and K-Mer Approaches. PLANTS 2021; 10:plants10071362. [PMID: 34371565 PMCID: PMC8309327 DOI: 10.3390/plants10071362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/01/2021] [Accepted: 07/01/2021] [Indexed: 11/17/2022]
Abstract
Genome size is one of the fundamental cytogenetic features of a species, which is critical for the design and initiation of any genome sequencing projects and can provide essential insights in studying taxonomy, cytogenetics, phylogenesis, and evolutionary studies. However, this key cytogenetic information is almost lacking in the endemic species Reseda pentagyna and the locally rare species Reseda lutea in Saudi Arabia. Therefore, genome size was analyzed by propidium iodide PI flow cytometry and compared to k-mer analysis methods. The standard method for genome size measures (flow cytometry) estimated the genome size of R. lutea and R. pentagyna with nuclei isolation MB01 buffer were found to be 1.91 ± 0.02 and 2.09 ± 0.03 pg/2 °C, respectively, which corresponded approximately to a haploid genome size of 934 and 1.022 Mbp, respectively. For validation, K-mer analysis was performed on both species' Illumina paired-end sequencing data from both species. Five k-mer analysis approaches were examined for biocomputational estimation of genome size: A general formula and four well-known programs (CovEST, Kmergenie, FindGSE, and GenomeScope). The parameter preferences had a significant impact on GenomeScope and Kmergenie estimates. While the general formula estimations did not differ considerably, with an average genome size of 867.7 and 896. Mbp. The differences across flow cytometry and biocomputational predictions may be due to the high repeat content, particularly long repetitive regions in both genomes, 71% and 57%, which interfered with k-mer analysis. GenomeScope allowed quantification of high heterozygosity levels (1.04 and 1.37%) of R. lutea and R. pentagyna genomes, respectively. Based on our observations, R. lutea may have a tetraploid genome or higher. Our results revealed fundamental cytogenetic information for R. lutea and R. pentagyna, which should be used in future taxonomic studies and whole-genome sequencing.
Collapse
|
11
|
Measuring Genome Sizes Using Read-Depth, k-mers, and Flow Cytometry: Methodological Comparisons in Beetles (Coleoptera). G3-GENES GENOMES GENETICS 2020; 10:3047-3060. [PMID: 32601059 PMCID: PMC7466995 DOI: 10.1534/g3.120.401028] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Measuring genome size across different species can yield important insights into evolution of the genome and allow for more informed decisions when designing next-generation genomic sequencing projects. New techniques for estimating genome size using shallow genomic sequence data have emerged which have the potential to augment our knowledge of genome sizes, yet these methods have only been used in a limited number of empirical studies. In this project, we compare estimation methods using next-generation sequencing (k-mer methods and average read depth of single-copy genes) to measurements from flow cytometry, a standard method for genome size measures, using ground beetles (Carabidae) and other members of the beetle suborder Adephaga as our test system. We also present a new protocol for using read-depth of single-copy genes to estimate genome size. Additionally, we report flow cytometry measurements for five previously unmeasured carabid species, as well as 21 new draft genomes and six new draft transcriptomes across eight species of adephagan beetles. No single sequence-based method performed well on all species, and all tended to underestimate the genome sizes, although only slightly in most samples. For one species, Bembidion sp. nr. transversale, most sequence-based methods yielded estimates half the size suggested by flow cytometry.
Collapse
|
12
|
Plasmid-mediated metronidazole resistance in Clostridioides difficile. Nat Commun 2020; 11:598. [PMID: 32001686 PMCID: PMC6992631 DOI: 10.1038/s41467-020-14382-1] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 12/24/2019] [Indexed: 12/17/2022] Open
Abstract
Metronidazole was until recently used as a first-line treatment for potentially life-threatening Clostridioides difficile (CD) infection. Although cases of metronidazole resistance have been documented, no clear mechanism for metronidazole resistance or a role for plasmids in antimicrobial resistance has been described for CD. Here, we report genome sequences of seven susceptible and sixteen resistant CD isolates from human and animal sources, including isolates from a patient with recurrent CD infection by a PCR ribotype (RT) 020 strain, which developed resistance to metronidazole over the course of treatment (minimal inhibitory concentration [MIC] = 8 mg L−1). Metronidazole resistance correlates with the presence of a 7-kb plasmid, pCD-METRO. pCD-METRO is present in toxigenic and non-toxigenic resistant (n = 23), but not susceptible (n = 563), isolates from multiple countries. Introduction of a pCD-METRO-derived vector into a susceptible strain increases the MIC 25-fold. Our finding of plasmid-mediated resistance can impact diagnostics and treatment of CD infections. Cases of C. difficile (CD) resistant to metronidazole have been reported but the mechanism remains enigmatic. Here the authors identify a plasmid, which correlates with metronidazole resistance status in a large international collection of CD isolates, and demonstrate that the plasmid can confer metronidazole resistance.
Collapse
|
13
|
Harrand AS, Kovac J, Carroll LM, Guariglia-Oropeza V, Kent DJ, Wiedmann M. Assembly and Characterization of a Pathogen Strain Collection for Produce Safety Applications: Pre-growth Conditions Have a Larger Effect on Peroxyacetic Acid Tolerance Than Strain Diversity. Front Microbiol 2019; 10:1223. [PMID: 31231329 PMCID: PMC6558390 DOI: 10.3389/fmicb.2019.01223] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/16/2019] [Indexed: 12/21/2022] Open
Abstract
Effective control of foodborne pathogens on produce requires science-based validation of interventions and control strategies, which typically involves challenge studies with a set of bacterial strains representing the target pathogens or appropriate surrogates. In order to facilitate these types of studies, a produce-relevant strain collection was assembled to represent strains from produce outbreaks or pre-harvest environments, including Listeria monocytogenes (n = 11), Salmonella enterica (n = 23), shiga-toxin producing Escherichia coli (STEC) (n = 13), and possible surrogate organisms (n = 8); all strains were characterized by whole genome sequencing (WGS). Strain diversity was assured by including the 10 most common S. enterica serotypes, L. monocytogenes lineages I-IV, and E. coli O157 as well as selected "non-O157" STEC serotypes. As it has previously been shown that strains and genetic lineages of a pathogen may differ in their ability to survive different stress conditions, a subset of representative strains for each "pathogen group" (e.g., Salmonella, STEC) was selected and assessed for survival of exposure to peroxyacetic acid (PAA) using strains pre-grown under different conditions including (i) low pH, (ii) high salt, (iii) reduced water activity, (iv) different growth phases, (v) minimal medium, and (vi) different temperatures (21°C, 37°C). The results showed that across the three pathogen groups pre-growth conditions had a larger effect on bacterial reduction after PAA exposure as compared to strain diversity. Interestingly, bacteria exposed to salt stress (4.5% NaCl) consistently showed the least reduction after exposure to PAA; however, for STEC, strains pre-grown at 21°C were as tolerant to PAA exposure as strains pre-grown under salt stress. Overall, our data suggests that challenge studies conducted with multi-strain cocktails (pre-grown under a single specific condition) may not necessarily reflect the relevant phenotypic range needed to appropriately assess different intervention strategies.
Collapse
Affiliation(s)
| | - Jasna Kovac
- Department of Food Science, Pennsylvania State University, University Park, PA, United States
| | - Laura M. Carroll
- Department of Food Science, Cornell University, Ithaca, NY, United States
| | | | - David J. Kent
- Department of Statistical Science, Cornell University, Ithaca, NY, United States
| | - Martin Wiedmann
- Department of Food Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
14
|
Acuña-Amador L, Primot A, Cadieu E, Roulet A, Barloy-Hubler F. Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains. BMC Genomics 2018; 19:54. [PMID: 29338683 PMCID: PMC5771137 DOI: 10.1186/s12864-017-4429-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/29/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation. RESULTS We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains. CONCLUSIONS In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.
Collapse
Affiliation(s)
- Luis Acuña-Amador
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.,Laboratorio de Investigación en Bacteriología Anaerobia, Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Aline Primot
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France
| | - Edouard Cadieu
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France
| | - Alain Roulet
- GenoToul Genome & Transcriptome (GeT-PlaGe), INRA, US1426, Castanet-Tolosan, France
| | - Frédérique Barloy-Hubler
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.
| |
Collapse
|
15
|
Gamble T, McKenna E, Meyer W, Nielsen SV, Pinto BJ, Scantlebury DP, Higham TE. XX/XY Sex Chromosomes in the South American Dwarf Gecko (Gonatodes humeralis). J Hered 2017; 109:462-468. [DOI: 10.1093/jhered/esx112] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/19/2017] [Indexed: 11/13/2022] Open
Affiliation(s)
- Tony Gamble
- Department of Biological Sciences, Marquette University, Milwaukee, WI
- Bell Museum of Natural History, University of Minnesota, Saint Paul, MN
- Milwaukee Public Museum, Milwaukee, WI
| | | | - Wyatt Meyer
- Department of Biological Sciences, Marquette University, Milwaukee, WI
| | - Stuart V Nielsen
- Department of Biological Sciences, Marquette University, Milwaukee, WI
| | - Brendan J Pinto
- Department of Biological Sciences, Marquette University, Milwaukee, WI
| | | | - Timothy E Higham
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, Riverside, CA
| |
Collapse
|
16
|
Danilowicz C, Hermans L, Coljee V, Prévost C, Prentiss M. ATP hydrolysis provides functions that promote rejection of pairings between different copies of long repeated sequences. Nucleic Acids Res 2017; 45:8448-8462. [PMID: 28854739 PMCID: PMC5737215 DOI: 10.1093/nar/gkx582] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 07/05/2017] [Indexed: 01/30/2023] Open
Abstract
During DNA recombination and repair, RecA family proteins must promote rapid joining of homologous DNA. Repeated sequences with >100 base pair lengths occupy more than 1% of bacterial genomes; however, commitment to strand exchange was believed to occur after testing ∼20-30 bp. If that were true, pairings between different copies of long repeated sequences would usually become irreversible. Our experiments reveal that in the presence of ATP hydrolysis even 75 bp sequence-matched strand exchange products remain quite reversible. Experiments also indicate that when ATP hydrolysis is present, flanking heterologous dsDNA regions increase the reversibility of sequence matched strand exchange products with lengths up to ∼75 bp. Results of molecular dynamics simulations provide insight into how ATP hydrolysis destabilizes strand exchange products. These results inspired a model that shows how pairings between long repeated sequences could be efficiently rejected even though most homologous pairings form irreversible products.
Collapse
Affiliation(s)
| | - Laura Hermans
- Department of Physics, Harvard University, Cambridge, MA 02138, USA
| | - Vincent Coljee
- Department of Physics, Harvard University, Cambridge, MA 02138, USA
| | - Chantal Prévost
- Laboratoire de Biochimie Théorique, CNRS UMR 9080, IBPC, Paris, France
| | - Mara Prentiss
- Department of Physics, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
17
|
Draft Genome Sequence of
Actinomyces
succiniciruminis
Strain Am4
T
, Isolated from Cow Rumen Fluid. GENOME ANNOUNCEMENTS 2017; 5:5/29/e01587-16. [PMID: 28729282 PMCID: PMC5522949 DOI: 10.1128/genomea.01587-16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
ABSTRACT
Actinomyces succiniciruminis
strain Am4
T
, isolated from cow rumen fluid, can metabolize a range of substrates including complex carbohydrates to organic acids. Here, we report a 3.33-Mbp draft genome of
Actinomyces succiniciruminis
.
Collapse
|
18
|
Hiraki H, Kagoshima H, Kraus C, Schiffer PH, Ueta Y, Kroiher M, Schierenberg E, Kohara Y. Genome analysis of Diploscapter coronatus: insights into molecular peculiarities of a nematode with parthenogenetic reproduction. BMC Genomics 2017; 18:478. [PMID: 28646875 PMCID: PMC5483258 DOI: 10.1186/s12864-017-3860-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 06/13/2017] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Sexual reproduction involving the fusion of egg and sperm is prevailing among eukaryotes. In contrast, the nematode Diploscapter coronatus, a close relative of the model Caenorhabditis elegans, reproduces parthenogenetically. Neither males nor sperm have been observed and some steps of meiosis are apparently skipped in this species. To uncover the genomic changes associated with the evolution of parthenogenesis in this nematode, we carried out a genome analysis. RESULTS We obtained a 170 Mbp draft genome in only 511 scaffolds with a N50 length of 1 Mbp. Nearly 90% of these scaffolds constitute homologous pairs with a 5.7% heterozygosity on average and inversions and translocations, meaning that the 170 Mbp sequences correspond to the diploid genome. Fluorescent staining shows that the D. coronatus genome consists of two chromosomes (2n = 2). In our genome annotation, we found orthologs of 59% of the C. elegans genes. However, a number of genes were missing or very divergent. These include genes involved in sex determination (e.g. xol-1, tra-2) and meiosis (e.g. the kleisins rec-8 and coh-3/4) giving a possible explanation for the absence of males and the second meiotic division. The high degree of heterozygosity allowed us to analyze the expression level of individual alleles. Most of the homologous pairs show very similar expression levels but others exhibit a 2-5-fold difference. CONCLUSIONS Our high-quality draft genome of D. coronatus reveals the peculiarities of the genome of parthenogenesis and provides some clues to the genetic basis for parthenogenetic reproduction. This draft genome should be the basis to elucidate fundamental questions related to parthenogenesis such as its origin and mechanisms through comparative analyses with other nematodes. Furthermore, being the closest outgroup to the genus Caenorhabditis, the draft genome will help to disclose many idiosyncrasies of the model C. elegans and its congeners in future studies.
Collapse
Affiliation(s)
- Hideaki Hiraki
- Genome Biology Laboratory, National Institute of Genetics, Mishima, Japan
| | - Hiroshi Kagoshima
- Genome Biology Laboratory, National Institute of Genetics, Mishima, Japan
- Transdisciplinary Research Integration Center, Research Organization of Information and Systems, Tokyo, Japan
| | | | | | - Yumiko Ueta
- Genome Biology Laboratory, National Institute of Genetics, Mishima, Japan
| | - Michael Kroiher
- Zoologisches Institut, Universität zu Köln, Cologne, NRW Germany
| | | | - Yuji Kohara
- Genome Biology Laboratory, National Institute of Genetics, Mishima, Japan
| |
Collapse
|
19
|
Liu S, Zheng J, Migeon P, Ren J, Hu Y, He C, Liu H, Fu J, White FF, Toomajian C, Wang G. Unbiased K-mer Analysis Reveals Changes in Copy Number of Highly Repetitive Sequences During Maize Domestication and Improvement. Sci Rep 2017; 7:42444. [PMID: 28186206 PMCID: PMC5301235 DOI: 10.1038/srep42444] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/10/2017] [Indexed: 12/15/2022] Open
Abstract
The major component of complex genomes is repetitive elements, which remain recalcitrant to characterization. Using maize as a model system, we analyzed whole genome shotgun (WGS) sequences for the two maize inbred lines B73 and Mo17 using k-mer analysis to quantify the differences between the two genomes. Significant differences were identified in highly repetitive sequences, including centromere, 45S ribosomal DNA (rDNA), knob, and telomere repeats. Genotype specific 45S rDNA sequences were discovered. The B73 and Mo17 polymorphic k-mers were used to examine allele-specific expression of 45S rDNA in the hybrids. Although Mo17 contains higher copy number than B73, equivalent levels of overall 45S rDNA expression indicates that transcriptional or post-transcriptional regulation mechanisms operate for the 45S rDNA in the hybrids. Using WGS sequences of B73xMo17 doubled haploids, genomic locations showing differential repetitive contents were genetically mapped, which displayed different organization of highly repetitive sequences in the two genomes. In an analysis of WGS sequences of HapMap2 lines, including maize wild progenitor, landraces, and improved lines, decreases and increases in abundance of additional sets of k-mers associated with centromere, 45S rDNA, knob, and retrotransposons were found among groups, revealing global evolutionary trends of genomic repeats during maize domestication and improvement.
Collapse
Affiliation(s)
- Sanzhen Liu
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506, USA
| | - Jun Zheng
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, P.R.China
| | - Pierre Migeon
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506, USA
| | - Jie Ren
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506, USA
| | - Ying Hu
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506, USA
| | - Cheng He
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, P.R.China
| | - Hongjun Liu
- State Key Laboratory of Crop Biology, Shandong Key Laboratory of Crop Biology, Taian 271018, P.R. China.,College of Life Sciences, Shandong Agricultural University, Taian 271018, P.R. China
| | - Junjie Fu
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, P.R.China
| | - Frank F White
- Department of Plant Pathology, University of Florida, Gainesville, FL, 32611, USA
| | | | - Guoying Wang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, P.R.China
| |
Collapse
|
20
|
Plaza Onate F, Batto JM, Juste C, Fadlallah J, Fougeroux C, Gouas D, Pons N, Kennedy S, Levenez F, Dore J, Ehrlich SD, Gorochov G, Larsen M. Quality control of microbiota metagenomics by k-mer analysis. BMC Genomics 2015; 16:183. [PMID: 25887914 PMCID: PMC4373121 DOI: 10.1186/s12864-015-1406-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 02/26/2015] [Indexed: 01/04/2023] Open
Abstract
Background The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue. Results We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets. Conclusions We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Florian Plaza Onate
- INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350, Jouy en Josas, France.
| | | | - Catherine Juste
- INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350, Jouy en Josas, France. .,UMR1319 Micalis, INRA, Jouy-en-Josas, France.
| | - Jehane Fadlallah
- Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l'Hôpital, 75013, Paris, France. .,Département d'Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013, Paris, France.
| | - Cyrielle Fougeroux
- Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l'Hôpital, 75013, Paris, France.
| | - Doriane Gouas
- Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l'Hôpital, 75013, Paris, France. .,Inserm UMR-S1135, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013, Paris, France.
| | - Nicolas Pons
- INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350, Jouy en Josas, France.
| | - Sean Kennedy
- INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350, Jouy en Josas, France.
| | - Florence Levenez
- INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350, Jouy en Josas, France. .,UMR1319 Micalis, INRA, Jouy-en-Josas, France.
| | - Joel Dore
- INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350, Jouy en Josas, France. .,UMR1319 Micalis, INRA, Jouy-en-Josas, France.
| | - S Dusko Ehrlich
- INRA, Institut National de la Recherche Agronomique, US1367 MetaGenoPolis, 78350, Jouy en Josas, France. .,UMR1319 Micalis, INRA, Jouy-en-Josas, France.
| | - Guy Gorochov
- Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l'Hôpital, 75013, Paris, France. .,Département d'Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013, Paris, France. .,Inserm UMR-S1135, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013, Paris, France.
| | - Martin Larsen
- Sorbonne Universités, UPMC Univ Paris 06, CR7, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), Hôpital Pitié-Salpêtrière, 83 bd. de l'Hôpital, 75013, Paris, France. .,Département d'Immunologie, AP-HP, Groupement Hospitalier Pitié-Salpêtrière, F-75013, Paris, France. .,Inserm UMR-S1135, Centre d'Immunologie et des Maladies Infectieuses (CIMI-Paris), F-75013, Paris, France.
| |
Collapse
|
21
|
Dominant short repeated sequences in bacterial genomes. Genomics 2015; 105:175-81. [PMID: 25561351 DOI: 10.1016/j.ygeno.2014.12.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 12/22/2014] [Accepted: 12/29/2014] [Indexed: 11/22/2022]
Abstract
We use a novel multidimensional searching approach to present the first exhaustive search for all possible repeated sequences in 166 genomes selected to cover the bacterial domain. We found an overrepresentation of repeated sequences in all but one of the genomes. The most prevalent repeats by far were related to interspaced short palindromic repeats (CRISPRs)—conferring bacterial adaptive immunity. We identified a deep branching clade of thermophilic Firmicutes containing the highest number of CRISPR repeats. We also identified a high prevalence of tandem repeated heptamers. In addition, we identified GC-rich repeats that could potentially be involved in recombination events. Finally, we identified repeats in a 16322 amino acid mega protein (involved in biofilm formation) and inverted repeats flanking miniature transposable elements (MITEs). In conclusion, the exhaustive search for repeated sequences identified new elements and distribution of these, which has implications for understanding both the ecology and evolution of bacteria.
Collapse
|
22
|
How Big is that Genome? Estimating Genome Size and Coverage from k-mer Abundance Spectra. STRING PROCESSING AND INFORMATION RETRIEVAL 2015. [DOI: 10.1007/978-3-319-23826-5_20] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
|