Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zheng H, Kingsford C, Marçais G. Improved design and analysis of practical minimizers. Bioinformatics 2020;36:i119-i127. [PMID: 32657376 PMCID: PMC8248892 DOI: 10.1093/bioinformatics/btaa472] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

For:	Zheng H, Kingsford C, Marçais G. Improved design and analysis of practical minimizers. Bioinformatics 2020;36:i119-i127. [PMID: 32657376 PMCID: PMC8248892 DOI: 10.1093/bioinformatics/btaa472] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Number

Cited by Other Article(s)

Rouzé T, Martayan I, Marchet C, Limasset A. Fractional hitting sets for efficient multiset sketching. Algorithms Mol Biol 2025;20:1. [PMID: 39923117 PMCID: PMC11807336 DOI: 10.1186/s13015-024-00268-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/01/2024] [Indexed: 02/10/2025] Open

Alser M, Eudine J, Mutlu O. Taming large-scale genomic analyses via sparsified genomics. Nat Commun 2025;16:876. [PMID: 39837860 PMCID: PMC11751491 DOI: 10.1038/s41467-024-55762-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 12/20/2024] [Indexed: 01/23/2025] Open

Kille B, Groot Koerkamp R, McAdams D, Liu A, Treangen TJ. A near-tight lower bound on the density of forward sampling schemes. Bioinformatics 2024;41:btae736. [PMID: 39666942 PMCID: PMC11676336 DOI: 10.1093/bioinformatics/btae736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 11/16/2024] [Accepted: 12/10/2024] [Indexed: 12/14/2024] Open

Kille B, Koerkamp RG, McAdams D, Liu A, Treangen TJ. A near-tight lower bound on the density of forward sampling schemes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.06.611668. [PMID: 39605515 PMCID: PMC11601301 DOI: 10.1101/2024.09.06.611668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]

Marçais G, Elder CS, Kingsford C. k-nonical space: sketching with reverse complements. Bioinformatics 2024;40:btae629. [PMID: 39432565 PMCID: PMC11549021 DOI: 10.1093/bioinformatics/btae629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 10/01/2024] [Accepted: 10/17/2024] [Indexed: 10/23/2024] Open

Ndiaye M, Prieto-Baños S, Fitzgerald LM, Yazdizadeh Kharrazi A, Oreshkov S, Dessimoz C, Sedlazeck FJ, Glover N, Majidian S. When less is more: sketching with minimizers in genomics. Genome Biol 2024;25:270. [PMID: 39402664 PMCID: PMC11472564 DOI: 10.1186/s13059-024-03414-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open

Marini S, Barquero A, Wadhwani AA, Bian J, Ruiz J, Boucher C, Prosperi M. OCTOPUS: Disk-based, Multiplatform, Mobile-friendly Metagenomics Classifier. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585215. [PMID: 38559026 PMCID: PMC10979967 DOI: 10.1101/2024.03.15.585215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]

Marçais G, DeBlasio D, Kingsford C. Sketching Methods with Small Window Guarantee Using Minimum Decycling Sets. J Comput Biol 2024;31:597-615. [PMID: 38980804 PMCID: PMC11304339 DOI: 10.1089/cmb.2024.0544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024] Open

Abstract

Most sequence sketching methods work by selecting specific k-mers from sequences so that the similarity between two sequences can be estimated using only the sketches. Because estimating sequence similarity is much faster using sketches than using sequence alignment, sketching methods are used to reduce the computational requirements of computational biology software. Applications using sketches often rely on properties of the k-mer selection procedure to ensure that using a sketch does not degrade the quality of the results compared with using sequence alignment. Two important examples of such properties are locality and window guarantees, the latter of which ensures that no long region of the sequence goes unrepresented in the sketch. A sketching method with a window guarantee, implicitly or explicitly, corresponds to a decycling set of the de Bruijn graph, which is a set of unavoidable k-mers. Any long enough sequence, by definition, must contain a k-mer from any decycling set (hence, the unavoidable property). Conversely, a decycling set also defines a sketching method by choosing the k-mers from the set as representatives. Although current methods use one of a small number of sketching method families, the space of decycling sets is much larger and largely unexplored. Finding decycling sets with desirable characteristics (e.g., small remaining path length) is a promising approach to discovering new sketching methods with improved performance (e.g., with small window guarantee). The Minimum Decycling Sets (MDSs) are of particular interest because of their minimum size. Only two algorithms, by Mykkeltveit and Champarnaud, are previously known to generate two particular MDSs, although there are typically a vast number of alternative MDSs. We provide a simple method to enumerate MDSs. This method allows one to explore the space of MDSs and to find MDSs optimized for desirable properties. We give evidence that the Mykkeltveit sets are close to optimal regarding one particular property, the remaining path length. A number of conjectures and computational and theoretical evidence to support them are presented. Code available at https://github.com/Kingsford-Group/mdsscope.

Collapse

Martayan I, Cazaux B, Limasset A, Marchet C. Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets. Bioinformatics 2024;40:i48-i57. [PMID: 38940123 PMCID: PMC11211824 DOI: 10.1093/bioinformatics/btae217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open

Yu YW. On Minimizers and Convolutional Filters: Theoretical Connections and Applications to Genome Analysis. J Comput Biol 2024;31:381-395. [PMID: 38687333 DOI: 10.1089/cmb.2024.0483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024] Open

Ma J, Zhao X, Qi E, Han R, Yu T, Li G. Highly efficient clustering of long-read transcriptomic data with GeLuster. Bioinformatics 2024;40:btae059. [PMID: 38310330 PMCID: PMC10881092 DOI: 10.1093/bioinformatics/btae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/08/2024] [Accepted: 01/30/2024] [Indexed: 02/05/2024] Open

Hoang M, Marçais G, Kingsford C. Density and Conservation Optimization of the Generalized Masked-Minimizer Sketching Scheme. J Comput Biol 2024;31:2-20. [PMID: 37975802 PMCID: PMC10794853 DOI: 10.1089/cmb.2023.0212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023] Open

Zheng H, Marçais G, Kingsford C. Creating and Using Minimizer Sketches in Computational Genomics. J Comput Biol 2023;30:1251-1276. [PMID: 37646787 PMCID: PMC11082048 DOI: 10.1089/cmb.2023.0094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023] Open

Marçais G, DeBlasio D, Kingsford C. Sketching methods with small window guarantee using minimum decycling sets. ARXIV 2023:arXiv:2311.03592v1. [PMID: 37986724 PMCID: PMC10659450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]

Abstract

Most sequence sketching methods work by selecting specific k -mers from sequences so that the similarity between two sequences can be estimated using only the sketches. Because estimating sequence similarity is much faster using sketches than using sequence alignment, sketching methods are used to reduce the computational requirements of computational biology software packages. Applications using sketches often rely on properties of the k -mer selection procedure to ensure that using a sketch does not degrade the quality of the results compared with using sequence alignment. Two important examples of such properties are locality and window guarantees, the latter of which ensures that no long region of the sequence goes unrepresented in the sketch. A sketching method with a window guarantee, implicitly or explicitly, corresponds to a Decycling Set, an unavoidable sets of k -mers. Any long enough sequence, by definition, must contain a k -mer from any decycling set (hence, it is unavoidable). Conversely, a decycling set also defines a sketching method by choosing the k -mers from the set as representatives. Although current methods use one of a small number of sketching method families, the space of decycling sets is much larger, and largely unexplored. Finding decycling sets with desirable characteristics (e.g., small remaining path length) is a promising approach to discovering new sketching methods with improved performance (e.g., with small window guarantee). The Minimum Decycling Sets (MDSs) are of particular interest because of their minimum size. Only two algorithms, by Mykkeltveit and Champarnaud, are previously known to generate two particular MDSs, although there are typically a vast number of alternative MDSs. We provide a simple method to enumerate MDSs. This method allows one to explore the space of MDSs and to find MDSs optimized for desirable properties. We give evidence that the Mykkeltveit sets are close to optimal regarding one particular property, the remaining path length. A number of conjectures and computational and theoretical evidence to support them are presented. Code available at https://github.com/Kingsford-Group/mdsscope.

Collapse

Greenberg G, Ravi AN, Shomorony I. LexicHash: sequence similarity estimation via lexicographic comparison of hashes. Bioinformatics 2023;39:btad652. [PMID: 37878809 PMCID: PMC10628434 DOI: 10.1093/bioinformatics/btad652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/11/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open

Kille B, Garrison E, Treangen TJ, Phillippy AM. Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation. Bioinformatics 2023;39:btad512. [PMID: 37603771 PMCID: PMC10505501 DOI: 10.1093/bioinformatics/btad512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/19/2023] [Accepted: 08/18/2023] [Indexed: 08/23/2023] Open

Shaw J, Yu YW. Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic. Genome Res 2023;33:1175-1187. [PMID: 36990779 PMCID: PMC10538486 DOI: 10.1101/gr.277637.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 03/16/2023] [Indexed: 03/31/2023]

Pellow D, Pu L, Ekim B, Kotlar L, Berger B, Shamir R, Orenstein Y. Efficient minimizer orders for large values of k using minimum decycling sets. Genome Res 2023;33:1154-1161. [PMID: 37558282 PMCID: PMC10538483 DOI: 10.1101/gr.277644.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/20/2023] [Indexed: 08/11/2023]

Pibiri GE, Shibuya Y, Limasset A. Locality-preserving minimal perfect hashing of k-mers. Bioinformatics 2023;39:i534-i543. [PMID: 37387137 PMCID: PMC10311298 DOI: 10.1093/bioinformatics/btad219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open

Kille B, Garrison E, Treangen TJ, Phillippy AM. Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.540882. [PMID: 37325780 PMCID: PMC10268037 DOI: 10.1101/2023.05.16.540882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]

Hoang M, Zheng H, Kingsford C. Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer. J Comput Biol 2022;29:1288-1304. [PMID: 36095142 PMCID: PMC9807081 DOI: 10.1089/cmb.2022.0275] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open

Shaw J, Yu YW. Theory of local k-mer selection with applications to long-read alignment. Bioinformatics 2022;38:4659-4669. [PMID: 36124869 PMCID: PMC9563685 DOI: 10.1093/bioinformatics/btab790] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 11/09/2021] [Accepted: 11/16/2021] [Indexed: 01/23/2023] Open

Belbasi M, Blanca A, Harris RS, Koslicki D, Medvedev P. The minimizer Jaccard estimator is biased and inconsistent. Bioinformatics 2022;38:i169-i176. [PMID: 35758786 PMCID: PMC9235516 DOI: 10.1093/bioinformatics/btac244] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Pibiri GE. Sparse and skew hashing of K-mers. Bioinformatics 2022;38:i185-i194. [PMID: 35758794 PMCID: PMC9235479 DOI: 10.1093/bioinformatics/btac245] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Sahlin K. Effective sequence similarity detection with strobemers. Genome Res 2021;31:2080-2094. [PMID: 34667119 PMCID: PMC8559714 DOI: 10.1101/gr.275648.121] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/20/2021] [Indexed: 01/08/2023]

Nyström-Persson J, Keeble-Gagnère G, Zawad N. Compact and evenly distributed k-mer binning for genomic sequences. Bioinformatics 2021;37:2563-2569. [PMID: 33693556 PMCID: PMC8428581 DOI: 10.1093/bioinformatics/btab156] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/15/2021] [Accepted: 03/03/2021] [Indexed: 11/17/2022] Open

Zheng H, Kingsford C, Marçais G. Sequence-specific minimizers via polar sets. Bioinformatics 2021;37:i187-i195. [PMID: 34252928 PMCID: PMC8686682 DOI: 10.1093/bioinformatics/btab313] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Edgar R. Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences. PeerJ 2021;9:e10805. [PMID: 33604186 PMCID: PMC7869670 DOI: 10.7717/peerj.10805] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/30/2020] [Indexed: 12/19/2022] Open

Baharav TZ, Kamath GM, Tse DN, Shomorony I. Spectral Jaccard Similarity: A New Approach to Estimating Pairwise Sequence Alignments. PATTERNS 2020;1:100081. [PMID: 33205128 PMCID: PMC7660437 DOI: 10.1016/j.patter.2020.100081] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 06/09/2020] [Accepted: 07/03/2020] [Indexed: 01/02/2023]