1
|
Karikari B, Lemay MA, Belzile F. k-mer-Based Genome-Wide Association Studies in Plants: Advances, Challenges, and Perspectives. Genes (Basel) 2023; 14:1439. [PMID: 37510343 PMCID: PMC10379394 DOI: 10.3390/genes14071439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/04/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Genome-wide association studies (GWAS) have allowed the discovery of marker-trait associations in crops over recent decades. However, their power is hampered by a number of limitations, with the key one among them being an overreliance on single-nucleotide polymorphisms (SNPs) as molecular markers. Indeed, SNPs represent only one type of genetic variation and are usually derived from alignment to a single genome assembly that may be poorly representative of the population under study. To overcome this, k-mer-based GWAS approaches have recently been developed. k-mer-based GWAS provide a universal way to assess variation due to SNPs, insertions/deletions, and structural variations without having to specifically detect and genotype these variants. In addition, k-mer-based analyses can be used in species that lack a reference genome. However, the use of k-mers for GWAS presents challenges such as data size and complexity, lack of standard tools, and potential detection of false associations. Nevertheless, efforts are being made to overcome these challenges and a general analysis workflow has started to emerge. We identify the priorities for k-mer-based GWAS in years to come, notably in the development of user-friendly programs for their analysis and approaches for linking significant k-mers to sequence variation.
Collapse
Affiliation(s)
- Benjamin Karikari
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
- Department of Agricultural Biotechnology, Faculty of Agriculture, Food and Consumer Sciences, University for Development Studies, Tamale P.O. Box TL 1882, Ghana
| | - Marc-André Lemay
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| |
Collapse
|
2
|
Xie S, Wang C, Zeng T, Wang H, Suo H. Whole-genome and comparative genome analysis of Mucor racemosus C isolated from Yongchuan Douchi. Int J Biol Macromol 2023; 234:123397. [PMID: 36739051 DOI: 10.1016/j.ijbiomac.2023.123397] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 01/10/2023] [Accepted: 01/18/2023] [Indexed: 02/05/2023]
Abstract
Mucor racemosus is the predominant fungal in the zhiqu stage of the fermentation of Yongchuan Douchi (Mucor-type), which plays an important role in the fermentation process of Yongchuan Douchi. However, there is a lack of information on the genetic analysis of M. racemosus. In this study, we isolated and identified M. racemosus C (accession no JAPEHQ000000000) from Yongchuan Douchi and analyzed the physiological indicators, then genomic information of the strain to perform a comprehensive analysis of its fermentation capacity and safety. M. racemosus C had neutral protease activity up to 68.051 U/mL at 30 °C and alkaline protease activity up to 57.367 U/mL at 25 °C. In addition, comparing the genomic data with the COGs database (NCBI), it was predicted that M. racemosus C undergoes extensive amino acid metabolism, making C suitable for the production of fermented foods (e.g., Douchi, Syoyu, and sufu). Finally, we performed virulence genes and resistance genes analysis, hemolysis experiment, aflatoxins assay, antibiotic resistance assay to evaluate the safety of M. racemosus C, and the results showed that M. racemosus C was safe, non-toxin-producing and non-hemolytic.
Collapse
Affiliation(s)
- Shicai Xie
- College of Food Science, Southwest University, Chongqing 400715, China; Food Industry Innovation Research Institute of Modern Sichuan Cuisine & Chongqing Flavor, Chongqing 400715, China
| | - Chen Wang
- College of Food Science, Southwest University, Chongqing 400715, China; Food Industry Innovation Research Institute of Modern Sichuan Cuisine & Chongqing Flavor, Chongqing 400715, China
| | - Tao Zeng
- College of Food Science, Southwest University, Chongqing 400715, China; Food Industry Innovation Research Institute of Modern Sichuan Cuisine & Chongqing Flavor, Chongqing 400715, China
| | - Hongwei Wang
- College of Food Science, Southwest University, Chongqing 400715, China; Food Industry Innovation Research Institute of Modern Sichuan Cuisine & Chongqing Flavor, Chongqing 400715, China
| | - Huayi Suo
- College of Food Science, Southwest University, Chongqing 400715, China; Food Industry Innovation Research Institute of Modern Sichuan Cuisine & Chongqing Flavor, Chongqing 400715, China.
| |
Collapse
|
3
|
Kropochev AI, Lashin SA, Matushkin YG, Klimenko AI. Trait-Based Method of Quantitative Assessment of Ecological Functional Groups in the Human Intestinal Microbiome. BIOLOGY 2023; 12:biology12010115. [PMID: 36671807 PMCID: PMC9855786 DOI: 10.3390/biology12010115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 12/15/2022] [Accepted: 12/30/2022] [Indexed: 01/15/2023]
Abstract
We propose the trait-based method for quantifying the activity of functional groups in the human gut microbiome based on metatranscriptomic data. It allows one to assess structural changes in the microbial community comprised of the following functional groups: butyrate-producers, acetogens, sulfate-reducers, and mucin-decomposing bacteria. It is another way to perform a functional analysis of metatranscriptomic data by focusing on the ecological level of the community under study. To develop the method, we used published data obtained in a carefully controlled environment and from a synthetic microbial community, where the problem of ambiguity between functionality and taxonomy is absent. The developed method was validated using RNA-seq data and sequencing data of the 16S rRNA amplicon on a simplified community. Consequently, the successful verification provides prospects for the application of this method for analyzing natural communities of the human intestinal microbiota.
Collapse
Affiliation(s)
- Andrew I. Kropochev
- Institute of Cytology and Genetics, Novosibirsk 630090, Russia
- Kurchatov Genomic Center of ICG SB RAS, Novosibirsk 630090, Russia
- Correspondence:
| | - Sergey A. Lashin
- Institute of Cytology and Genetics, Novosibirsk 630090, Russia
- Kurchatov Genomic Center of ICG SB RAS, Novosibirsk 630090, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Yury G. Matushkin
- Institute of Cytology and Genetics, Novosibirsk 630090, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Alexandra I. Klimenko
- Institute of Cytology and Genetics, Novosibirsk 630090, Russia
- Kurchatov Genomic Center of ICG SB RAS, Novosibirsk 630090, Russia
| |
Collapse
|
4
|
Cattaneo G, Ferraro Petrillo U, Giancarlo R, Palini F, Romualdi C. The power of word-frequency-based alignment-free functions: a comprehensive large-scale experimental analysis. Bioinformatics 2022; 38:925-932. [PMID: 34718420 DOI: 10.1093/bioinformatics/btab747] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 10/07/2021] [Accepted: 10/26/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Alignment-free (AF) distance/similarity functions are a key tool for sequence analysis. Experimental studies on real datasets abound and, to some extent, there are also studies regarding their control of false positive rate (Type I error). However, assessment of their power, i.e. their ability to identify true similarity, has been limited to some members of the D2 family. The corresponding experimental studies have concentrated on short sequences, a scenario no longer adequate for current applications, where sequence lengths may vary considerably. Such a State of the Art is methodologically problematic, since information regarding a key feature such as power is either missing or limited. RESULTS By concentrating on a representative set of word-frequency-based AF functions, we perform the first coherent and uniform evaluation of the power, involving also Type I error for completeness. Two alternative models of important genomic features (CIS Regulatory Modules and Horizontal Gene Transfer), a wide range of sequence lengths from a few thousand to millions, and different values of k have been used. As a result, we provide a characterization of those AF functions that is novel and informative. Indeed, we identify weak and strong points of each function considered, which may be used as a guide to choose one for analysis tasks. Remarkably, of the 15 functions that we have considered, only four stand out, with small differences between small and short sequence length scenarios. Finally, to encourage the use of our methodology for validation of future AF functions, the Big Data platform supporting it is public. AVAILABILITY AND IMPLEMENTATION The software is available at: https://github.com/pipp8/power_statistics. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Giuseppe Cattaneo
- Dipartimento di Informatica, Università di Salerno, Fisciano, SA 84084, Italy
| | | | - Raffaele Giancarlo
- Dipartimento di Matematica ed Informatica, Università di Palermo, 90133 Palermo, Italy
| | - Francesco Palini
- Dipartimento di Scienze Statistiche, Università di Roma-La Sapienza, 00185 Rome, Italy
| | - Chiara Romualdi
- Dipartimento di Biologia, Università di Padova, 35131 Padova, Italy
| |
Collapse
|
5
|
Bize A, Midoux C, Mariadassou M, Schbath S, Forterre P, Da Cunha V. Exploring short k-mer profiles in cells and mobile elements from Archaea highlights the major influence of both the ecological niche and evolutionary history. BMC Genomics 2021; 22:186. [PMID: 33726663 PMCID: PMC7962313 DOI: 10.1186/s12864-021-07471-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/24/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND K-mer-based methods have greatly advanced in recent years, largely driven by the realization of their biological significance and by the advent of next-generation sequencing. Their speed and their independence from the annotation process are major advantages. Their utility in the study of the mobilome has recently emerged and they seem a priori adapted to the patchy gene distribution and the lack of universal marker genes of viruses and plasmids. To provide a framework for the interpretation of results from k-mer based methods applied to archaea or their mobilome, we analyzed the 5-mer DNA profiles of close to 600 archaeal cells, viruses and plasmids. Archaea is one of the three domains of life. Archaea seem enriched in extremophiles and are associated with a high diversity of viral and plasmid families, many of which are specific to this domain. We explored the dataset structure by multivariate and statistical analyses, seeking to identify the underlying factors. RESULTS For cells, the 5-mer profiles were inconsistent with the phylogeny of archaea. At a finer taxonomic level, the influence of the taxonomy and the environmental constraints on 5-mer profiles was very strong. These two factors were interdependent to a significant extent, and the respective weights of their contributions varied according to the clade. A convergent adaptation was observed for the class Halobacteria, for which a strong 5-mer signature was identified. For mobile elements, coevolution with the host had a clear influence on their 5-mer profile. This enabled us to identify one previously known and one new case of recent host transfer based on the atypical composition of the mobile elements involved. Beyond the effect of coevolution, extrachromosomal elements strikingly retain the specific imprint of their own viral or plasmid taxonomic family in their 5-mer profile. CONCLUSION This specific imprint confirms that the evolution of extrachromosomal elements is driven by multiple parameters and is not restricted to host adaptation. In addition, we detected only recent host transfer events, suggesting the fast evolution of short k-mer profiles. This calls for caution when using k-mers for host prediction, metagenomic binning or phylogenetic reconstruction.
Collapse
Affiliation(s)
- Ariane Bize
- Université Paris-Saclay, INRAE, PROSE, F-92761, Antony, France.
| | - Cédric Midoux
- Université Paris-Saclay, INRAE, PROSE, F-92761, Antony, France.,Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.,Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France
| | - Mahendra Mariadassou
- Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.,Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France
| | - Sophie Schbath
- Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.,Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France
| | - Patrick Forterre
- Institut Pasteur, Unité de Virologie des Archées, Département de Microbiologie, 25 Rue du Docteur Roux, 75015, Paris, France. .,Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Violette Da Cunha
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|