Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Borile C, Labarre M, Franz S, Sola C, Refrégier G. Using affinity propagation for identifying subspecies among clonal organisms: lessons from M. tuberculosis. BMC Bioinformatics 2011;12:224. [PMID: 21635750 PMCID: PMC3126747 DOI: 10.1186/1471-2105-12-224] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2011] [Accepted: 06/02/2011] [Indexed: 12/26/2022] Open

For:	Borile C, Labarre M, Franz S, Sola C, Refrégier G. Using affinity propagation for identifying subspecies among clonal organisms: lessons from M. tuberculosis. BMC Bioinformatics 2011;12:224. [PMID: 21635750 PMCID: PMC3126747 DOI: 10.1186/1471-2105-12-224] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2011] [Accepted: 06/02/2011] [Indexed: 12/26/2022] Open

Number

Cited by Other Article(s)

Prioritized candidate causal haplotype blocks in plant genome-wide association studies. PLoS Genet 2022;18:e1010437. [PMID: 36251695 PMCID: PMC9612827 DOI: 10.1371/journal.pgen.1010437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 10/27/2022] [Accepted: 09/20/2022] [Indexed: 11/05/2022] Open

Abstract

Genome wide association studies (GWAS) can play an essential role in understanding genetic basis of complex traits in plants and animals. Conventional SNP-based linear mixed models (LMM) that marginally test single nucleotide polymorphisms (SNPs) have successfully identified many loci with major and minor effects in many GWAS. In plant, the relatively small population size in GWAS and the high genetic diversity found in many plant species can impede mapping efforts on complex traits. Here we present a novel haplotype-based trait fine-mapping framework, HapFM, to supplement current GWAS methods. HapFM uses genotype data to partition the genome into haplotype blocks, identifies haplotype clusters within each block, and then performs genome-wide haplotype fine-mapping to prioritize the candidate causal haplotype blocks of trait. We benchmarked HapFM, GEMMA, BSLMM, GMMAT, and BLINK in both simulated and real plant GWAS datasets. HapFM consistently resulted in higher mapping power than the other GWAS methods in high polygenicity simulation setting. Moreover, it resulted in smaller mapping intervals, especially in regions of high LD, achieved by prioritizing small candidate causal blocks in the larger haplotype blocks. In the Arabidopsis flowering time (FT10) datasets, HapFM identified four novel loci compared to GEMMA’s results, and the average mapping interval of HapFM was 9.6 times smaller than that of GEMMA. In conclusion, HapFM is tailored for plant GWAS to result in high mapping power on complex traits and improved on mapping resolution to facilitate crop improvement.

Genome-wide association studies (GWAS) are commonly used in human and plant studies to identify genetic variants responsible for the phenotype of interest and provide foundations for studying disease mechanisms and crop improvement. Most GWAS models are developed and optimized using human datasets. However, the difference between human and plant datasets essentially limits their applications in plant studies, especially when mapping complex traits such as drought resistance and yield. In this study, we present a novel GWAS method, HapFM, tailored for plant datasets to overcome the difficulties of many conventional GWAS methods. HapFM resulted in higher statistical power than conventional GWAS methods for mapping complex traits in our simulation and real dataset analyses. In addition, HapFM reduced the mapping interval by prioritizing candidate causal regions in the genome, which benefits the downstream experimental studies. Last but not least, HapFM can incorporate biological annotations to increase statistical power further. Overall, HapFM balances statistical power, result interpretability, and downstream experimental verifiability.

Collapse

Couvin D, Segretier W, Stattner E, Rastogi N. Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families. Database (Oxford) 2020;2020:baaa108. [PMID: 33320180 PMCID: PMC7737520 DOI: 10.1093/database/baaa108] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 11/12/2020] [Accepted: 11/20/2020] [Indexed: 11/18/2022]

Busch A, Homeier-Bachmann T, Abdel-Glil MY, Hackbart A, Hotzel H, Tomaso H. Using affinity propagation clustering for identifying bacterial clades and subclades with whole-genome sequences of Francisella tularensis. PLoS Negl Trop Dis 2020;14:e0008018. [PMID: 32991594 PMCID: PMC7523947 DOI: 10.1371/journal.pntd.0008018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Accepted: 12/27/2019] [Indexed: 12/31/2022] Open

Abstract

By combining a reference-independent SNP analysis and average nucleotide identity (ANI) with affinity propagation clustering (APC), we developed a significantly improved methodology allowing resolving phylogenetic relationships, based on objective criteria. These bioinformatics tools can be used as a general ruler to determine phylogenetic relationships and clustering of bacteria, exemplary done with Francisella (F.) tularensis. Molecular epidemiology of F. tularensis is currently assessed mostly based on laboratory methods and molecular analysis. The high evolutionary stability and the clonal nature makes Francisella ideal for subtyping with single nucleotide polymorphisms (SNPs). Sequencing and real-time PCR can be used to validate the SNP analysis. We investigate whole-genome sequences of 155 F. tularensis subsp. holarctica isolates. Phylogenetic testing was based on SNPs and average nucleotide identity (ANI) as reference independent, alignment-free methods taking small-scale and large-scale differences within the genomes into account. Especially the whole genome SNP analysis with kSNP3.0 allowed deciphering quite subtle signals of systematic differences in molecular variation. Affinity propagation clustering (APC) resulted in three clusters showing the known clades B.4, B.6, and B.12. These data correlated with the results of real-time PCR assays targeting canSNPs loci. Additionally, we detected two subtle sub-clusters. SplitsTree was used with standard-setting using the aligned SNPs from Parsnps. Together APC, HierBAPS, and SplitsTree enabled us to generate hypotheses about epidemiologic relationships between bacterial clusters and describing the distribution of isolates. Our data indicate that the choice of the typing technique can increase our understanding of the pathogenesis and transmission of diseases with the eventual for prevention. This is opening perspectives to be applied to other bacterial species. The data provide evidence that Germany might be the collision zone where the clade B.12, also known as the East European clade, overlaps with the clade B.6, also known as the Iberian clade. Described methods allow generating a new, more detailed perspective for F. tularensis subsp. holarctica phylogeny. These results may encourage to determine phylogenetic relationships and clustering of other bacteria the same way.

Collapse

Barrangou R, Dudley EG. CRISPR-Based Typing and Next-Generation Tracking Technologies. Annu Rev Food Sci Technol 2016;7:395-411. [PMID: 26772411 DOI: 10.1146/annurev-food-022814-015729] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Azé J, Sola C, Zhang J, Lafosse-Marin F, Yasmin M, Siddiqui R, Kremer K, van Soolingen D, Refrégier G. Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm. PLoS One 2015;10:e0130912. [PMID: 26154264 PMCID: PMC4496040 DOI: 10.1371/journal.pone.0130912] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 05/25/2015] [Indexed: 11/18/2022] Open

Abstract

Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface “TBminer.” Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual taxonomy for human Mycobacterium tuberculosis complex. Additional developments using SNPs will help stabilizing it.

Collapse

Sola C. Clustured regularly interspersed short palindromic repeats (CRISPR) genetic diversity studies as a mean to reconstruct the evolution of the Mycobacterium tuberculosis complex. Tuberculosis (Edinb) 2015;95 Suppl 1:S159-66. [PMID: 25748060 DOI: 10.1016/j.tube.2015.02.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Vasconcellos SEG, Acosta CC, Gomes LL, Conceição EC, Lima KV, de Araujo MI, Leite MDL, Tannure F, Caldas PCDS, Gomes HM, Santos AR, Gomgnimbou MK, Sola C, Couvin D, Rastogi N, Boechat N, Suffys PN. Strain classification of Mycobacterium tuberculosis isolates in Brazil based on genotypes obtained by spoligotyping, mycobacterial interspersed repetitive unit typing and the presence of large sequence and single nucleotide polymorphism. PLoS One 2014;9:e107747. [PMID: 25314118 PMCID: PMC4196770 DOI: 10.1371/journal.pone.0107747] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Accepted: 08/21/2014] [Indexed: 11/26/2022] Open

Affiliation(s)

Sidra E. G. Vasconcellos Laboratory of Molecular Biology Applied to Mycobacteria, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Rio de Janeiro, Brazil Multidisciplinary Research Laboratory, University Hospital Clementino Fraga Filho – HUCFF, Federal University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
Chyntia Carolina Acosta Laboratory of Cellular Microbiology, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Rio de Janeiro, Brazil
Lia Lima Gomes Laboratory of Molecular Biology Applied to Mycobacteria, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Rio de Janeiro, Brazil
Emilyn Costa Conceição Instituto Evandro Chagas, Section of Bacteriology and Mycology, Belém, Pará, Brazil
Karla Valéria Lima Instituto Evandro Chagas, Section of Bacteriology and Mycology, Belém, Pará, Brazil
Marcelo Ivens de Araujo Laboratory of Molecular Biology Applied to Mycobacteria, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Rio de Janeiro, Brazil
Maria de Lourdes Leite Hospital Municipal Rafael de Paula Souza, Municipal Secretary of Health, Rio de Janeiro, Rio de Janeiro, Brazil
Flávio Tannure Hospital Municipal Rafael de Paula Souza, Municipal Secretary of Health, Rio de Janeiro, Rio de Janeiro, Brazil
Paulo Cesar de Souza Caldas Centro de Referência Professor Hélio Fraga, Escola Nacional de Saúde Publica Sergio Arouca, FIOCRUZ, Rio de Janeiro, Rio de Janeiro, Brazil
Harrison M. Gomes Laboratory of Molecular Biology Applied to Mycobacteria, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Rio de Janeiro, Brazil
Adalberto Rezende Santos Laboratory of Molecular Biology Applied to Mycobacteria, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Rio de Janeiro, Brazil
Michel K. Gomgnimbou CNRS–Université Paris–Sud, Institut de Génétique et Microbiologie–Infection Genetics Emerging Pathogens Evolution Team, Orsay, France
Christophe Sola CNRS–Université Paris–Sud, Institut de Génétique et Microbiologie–Infection Genetics Emerging Pathogens Evolution Team, Orsay, France
David Couvin Supranational TB Reference Laboratory, Unité de la Tuberculose et des Mycobactéries, Institut Pasteur de Guadeloupe, Abymes, Guadeloupe, France
Nalin Rastogi Supranational TB Reference Laboratory, Unité de la Tuberculose et des Mycobactéries, Institut Pasteur de Guadeloupe, Abymes, Guadeloupe, France
Neio Boechat Multidisciplinary Research Laboratory, University Hospital Clementino Fraga Filho – HUCFF, Federal University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil Graduate Program in Clinical Medicine, Faculty of Medicine, University Hospital Clementino Fraga Filho, Rio de Janeiro, Rio de Janeiro, Brazil
Philip Noel Suffys Laboratory of Molecular Biology Applied to Mycobacteria, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Rio de Janeiro, Brazil * E-mail:

Collapse

Wang M, Zhang W, Ding W, Dai D, Zhang H, Xie H, Chen L, Guo Y, Xie J. Parallel clustering algorithm for large-scale biological data sets. PLoS One 2014;9:e91315. [PMID: 24705246 PMCID: PMC3976248 DOI: 10.1371/journal.pone.0091315] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 02/10/2014] [Indexed: 02/06/2023] Open

Ozcaglar C, Shabbeer A, Kurepina N, Rastogi N, Yener B, Bennett KP. Inferred spoligoforest topology unravels spatially bimodal distribution of mutations in the DR region. IEEE Trans Nanobioscience 2012;11:191-202. [PMID: 22987125 DOI: 10.1109/tnb.2012.2213265] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]

Abstract

Biomarkers of Mycobacterium tuberculosis complex (MTBC) mutate over time. Among the biomarkers of MTBC, spacer oligonucleotide type (spoligotype) and mycobacterium interspersed repetitive unit (MIRU) patterns are commonly used to genotype clinical MTBC strains. In this study, we present an evolution model of spoligotype rearrangements using MIRU patterns to disambiguate the ancestors of spoligotypes. We use a large patient dataset from the United States Centers for Disease Control and Prevention (CDC) to generate this model. Based on the contiguous deletion assumption and rare observation of convergent evolution, we first generate the most parsimonious forest of spoligotypes, called a spoligoforest, using three genetic distance measures. An analysis of topological attributes of the spoligoforest and number of variations at the direct repeat (DR) locus of each strain reveals interesting properties of deletions in the DR region. First, we compare our mutation model to existing mutation models of spoligotypes and find that our mutation model produces as many within-lineage mutation events as other models, with slightly higher segregation accuracy. Second, based on our mutation model, the number of descendant spoligotypes follows a power law distribution. Third, contrary to prior studies, the power law distribution does not plausibly fit to the mutation length frequency. Moreover, we find that the total number of mutation events at consecutive spacers follows a spatially bimodal distribution. The two modes are spacers 13 and 40, which are hotspots for chromosomal rearrangements, and the change point is spacer 34, which is absent in most MTBC strains. Based on this observation, we built two alternative models for mutation length frequency: the Starting Point Model (SPM) and the Longest Block Model (LBM). Both models are plausibly good fits to the mutation length frequency distribution, as verified by the goodness-of-fit test. We also apply SPM and LBM to a dataset from Institut Pasteur de Guadeloupe and verify that these models hold for different strain datasets.

Collapse

Shabbeer A, Cowan LS, Ozcaglar C, Rastogi N, Vandenberg SL, Yener B, Bennett KP. TB-Lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex. INFECTION GENETICS AND EVOLUTION 2012;12:789-97. [PMID: 22406225 DOI: 10.1016/j.meegid.2012.02.010] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2011] [Revised: 02/18/2012] [Accepted: 02/21/2012] [Indexed: 11/19/2022]

Barrangou R, Horvath P. CRISPR: new horizons in phage resistance and strain identification. Annu Rev Food Sci Technol 2011;3:143-62. [PMID: 22224556 DOI: 10.1146/annurev-food-022811-101134] [Citation(s) in RCA: 124] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Shabbeer A, Ozcaglar C, Yener B, Bennett KP. Web tools for molecular epidemiology of tuberculosis. INFECTION GENETICS AND EVOLUTION 2011;12:767-81. [PMID: 21903179 DOI: 10.1016/j.meegid.2011.08.019] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2011] [Revised: 08/14/2011] [Accepted: 08/19/2011] [Indexed: 01/03/2023]

Ozcaglar C, Shabbeer A, Kurepina N, Yener B, Bennett KP. Data-driven insights into deletions of Mycobacterium tuberculosis complex chromosomal DR region using spoligoforests. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2011:75-82. [PMID: 22343484 PMCID: PMC3279189 DOI: 10.1109/bibm.2011.64] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Abstract

Biomarkers of Mycobacterium tuberculosis complex (MTBC) mutate over time. Among the biomarkers of MTBC, spacer oligonucleotide type (spoligotype) and Mycobacterium Interspersed Repetitive Unit (MIRU) patterns are commonly used to genotype clinical MTBC strains. In this study, we present an evolution model of spoligotype rearrangements using MIRU patterns to disambiguate the ancestors of spoligotypes, in a large patient dataset from the United States Centers for Disease Control and Prevention (CDC). Based on the contiguous deletion assumption and rare observation of convergent evolution, we first generate the most parsimonious forest of spoligotypes, called a spoligoforest, using three genetic distance measures. An analysis of topological attributes of the spoligoforest and number of variations at the direct repeat (DR) locus of each strain reveals interesting properties of deletions in the DR region. First, we compare our mutation model to existing mutation models of spoligotypes and find that our mutation model produces as many within-lineage mutation events as other models, with slightly higher segregation accuracy. Second, based on our mutation model, the number of descendant spoligotypes follows a power law distribution. Third, contrary to prior studies, the power law distribution does not plausibly fit to the mutation length frequency. Finally, the total number of mutation events at consecutive DR loci follows a bimodal distribution, which results in accumulation of shorter deletions in the DR region. The two modes are spacers 13 and 40, which are hotspots for chromosomal rearrangements. The change point in the bimodal distribution is spacer 34, which is absent in most MTBC strains. This bimodal separation results in accumulation of shorter deletions, which explains why a power law distribution is not a plausible fit to the mutation length frequency.

Collapse