Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Balaban M, Sarmashghi S, Mirarab S. APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments. Syst Biol 2020;69:566-578. [PMID: 31545363 PMCID: PMC7164367 DOI: 10.1093/sysbio/syz063] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 09/05/2019] [Accepted: 09/10/2019] [Indexed: 11/14/2022] Open

For:	Balaban M, Sarmashghi S, Mirarab S. APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments. Syst Biol 2020;69:566-578. [PMID: 31545363 PMCID: PMC7164367 DOI: 10.1093/sysbio/syz063] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 09/05/2019] [Accepted: 09/10/2019] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Jiang Y, McDonald D, Perry D, Knight R, Mirarab S. Scaling DEPP phylogenetic placement to ultra-large reference trees: a tree-aware ensemble approach. Bioinformatics 2024;40:btae361. [PMID: 38870525 PMCID: PMC11193062 DOI: 10.1093/bioinformatics/btae361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 04/09/2024] [Accepted: 06/12/2024] [Indexed: 06/15/2024] Open

Mirarab S, Bafna V. Analyses of Nuclear Reads Obtained Using Genome Skimming. Methods Mol Biol 2024;2744:247-265. [PMID: 38683324 DOI: 10.1007/978-1-0716-3581-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]

Truszkowski J, Perrigo A, Broman D, Ronquist F, Antonelli A. Online tree expansion could help solve the problem of scalability in Bayesian phylogenetics. Syst Biol 2023;72:1199-1206. [PMID: 37498209 PMCID: PMC10627553 DOI: 10.1093/sysbio/syad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 06/22/2023] [Accepted: 07/11/2023] [Indexed: 07/28/2023] Open

Wedell E, Cai Y, Warnow T. SCAMPP: Scaling Alignment-Based Phylogenetic Placement to Large Trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1417-1430. [PMID: 35471888 DOI: 10.1109/tcbb.2022.3170386] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Rachtman E, Sarmashghi S, Bafna V, Mirarab S. Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling. Cell Syst 2022;13:817-829.e3. [PMID: 36265468 PMCID: PMC9589918 DOI: 10.1016/j.cels.2022.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 03/14/2022] [Accepted: 06/28/2022] [Indexed: 01/26/2023]

Zaharias P, Warnow T. Recent progress on methods for estimating and updating large phylogenies. Philos Trans R Soc Lond B Biol Sci 2022;377:20210244. [PMID: 35989607 PMCID: PMC9393559 DOI: 10.1098/rstb.2021.0244] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/07/2022] [Indexed: 12/20/2022] Open

Sgroi G, Iatta R, Lovreglio P, Stufano A, Laidoudi Y, Mendoza-Roldan JA, Bezerra-Santos MA, Veneziano V, Di Gennaro F, Saracino A, Chironna M, Bandi C, Otranto D. Detection of Endosymbiont Candidatus Midichloria mitochondrii and Tickborne Pathogens in Humans Exposed to Tick Bites, Italy. Emerg Infect Dis 2022;28:1824-1832. [PMID: 35997363 PMCID: PMC9423927 DOI: 10.3201/eid2809.220329] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Banchi E, Manna V, Fonti V, Fabbro C, Celussi M. Improving environmental monitoring of Vibrionaceae in coastal ecosystems through 16S rRNA gene amplicon sequencing. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022;29:67466-67482. [PMID: 36056283 PMCID: PMC9492620 DOI: 10.1007/s11356-022-22752-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]

Jiang Y, Tabaghi P, Mirarab S. Learning Hyperbolic Embedding for Phylogenetic Tree Placement and Updates. BIOLOGY 2022;11:biology11091256. [PMID: 36138735 PMCID: PMC9495508 DOI: 10.3390/biology11091256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/11/2022] [Accepted: 08/19/2022] [Indexed: 11/20/2022]

Balaban M, Bristy NA, Faisal A, Bayzid MS, Mirarab S. Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. BIOINFORMATICS ADVANCES 2022;2:vbac055. [PMID: 35992043 PMCID: PMC9383262 DOI: 10.1093/bioadv/vbac055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/09/2022] [Indexed: 01/27/2023]

Key Words Collapse

MESH Headings Collapse

Grants Collapse

Affiliation(s)
Metin Balaban
Nishat Anjum Bristy
Ahnaf Faisal
Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
Md Shamsuzzoha Bayzid
Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
Siavash Mirarab
To whom correspondence should be addressed.
Collapse

Hasan NB, Balaban M, Biswas A, Bayzid MS, Mirarab S. Distance-Based Phylogenetic Placement with Statistical Support. BIOLOGY 2022;11:biology11081212. [PMID: 36009839 PMCID: PMC9404983 DOI: 10.3390/biology11081212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 07/30/2022] [Accepted: 08/02/2022] [Indexed: 11/16/2022]

Abstract

Simple Summary

Phylogenetic placement seeks to find the optimal position for a new query species on an existing backbone tree. Fast and accurate distance-based phylogenetic placement methods lack the crucial feature of estimating the support values for various placements of a query sequence. This study presents both parametric and nonparametric methods for measuring the support values of distance-based phylogenetic placements.

Abstract

Phylogenetic identification of unknown sequences by placing them on a tree is routinely attempted in modern ecological studies. Such placements are often obtained from incomplete and noisy data, making it essential to augment the results with some notion of uncertainty. While the standard likelihood-based methods designed for placement naturally provide such measures of uncertainty, the newer and more scalable distance-based methods lack this crucial feature. Here, we adopt several parametric and nonparametric sampling methods for measuring the support of phylogenetic placements that have been obtained with the use of distances. Comparing the alternative strategies, we conclude that nonparametric bootstrapping is more accurate than the alternatives. We go on to show how bootstrapping can be performed efficiently using a linear algebraic formulation that makes it up to 30 times faster and implement this optimized version as part of the distance-based placement software APPLES. By examining a wide range of applications, we show that the relative accuracy of maximum likelihood (ML) support values as compared to distance-based methods depends on the application and the dataset. ML is advantageous for fragmentary queries, while distance-based support values are more accurate for full-length and multi-gene datasets. With the quantification of uncertainty, our work fills a crucial gap that prevents the broader adoption of distance-based placement tools.

Collapse

Xu T, Kong L, Li Q. Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case. Genes (Basel) 2022;13:genes13071192. [PMID: 35885975 PMCID: PMC9318368 DOI: 10.3390/genes13071192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 06/26/2022] [Accepted: 06/28/2022] [Indexed: 02/05/2023] Open

African mitochondrial haplogroup L7: a 100,000-year-old maternal human lineage discovered through reassessment and new sequencing. Sci Rep 2022;12:10747. [PMID: 35750688 PMCID: PMC9232647 DOI: 10.1038/s41598-022-13856-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 05/30/2022] [Indexed: 11/17/2022] Open

Jiang Y, Balaban M, Zhu Q, Mirarab S. DEPP: Deep Learning Enables Extending Species Trees using Single Genes. Syst Biol 2022;72:17-34. [PMID: 35485976 DOI: 10.1093/sysbio/syac031] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 04/13/2022] [Accepted: 04/22/2022] [Indexed: 11/13/2022] Open

Ye C, Thornlow B, Kramer A, McBroome J, Hinrichs A, Corbett-Detig R, Turakhia Y. Pandemic-scale phylogenetics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.12.03.470766. [PMID: 34927180 PMCID: PMC8679213 DOI: 10.1101/2021.12.03.470766] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Sarmashghi S, Balaban M, Rachtman E, Touri B, Mirarab S, Bafna V. Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT. PLoS Comput Biol 2021;17:e1009449. [PMID: 34780468 PMCID: PMC8629397 DOI: 10.1371/journal.pcbi.1009449] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 11/29/2021] [Accepted: 09/13/2021] [Indexed: 01/26/2023] Open

Abstract

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had 2.2% error in length estimation compared to 27% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_shahab-2Dsarmashghi_RESPECT.git&d=DwIGAw&c=-35OiAkTchMrZOngvJPOeA&r=ZozViWvD1E8PorCkfwYKYQMVKFoEcqLFm4Tg49XnPcA&m=f-xS8GMHKckknkc7Xpp8FJYw_ltUwz5frOw1a5pJ81EpdTOK8xhbYmrN4ZxniM96&s=717o8hLR1JmHFpRPSWG6xdUQTikyUjicjkipjFsKG4w&e=.

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome skims) could be transformative for genomic ecology. Analyzing genome skims, mostly based on statistics of small oligomers, remains challenging, but recent results have shown the advantage of this approach for the identification and phylogenetic placement of eukaryotic species. In this paper, we present a method, RESPECT, to estimate genomic properties such as genome length and repetitiveness from low-coverage genome skims. We trained RESPECT using assembled genomes and tested it on low-coverage simulated and real reads. Benchmarking results reveal that RESPECT has excellent accuracy in estimating the genome length compared to other methods, and can provide critical information regarding the repeat structure of the genome.

Collapse

Balaban M, Jiang Y, Roush D, Zhu Q, Mirarab S. Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol Ecol Resour 2021;22:1213-1227. [PMID: 34643995 DOI: 10.1111/1755-0998.13527] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 10/05/2021] [Indexed: 01/04/2023]

Blanke M, Morgenstern B. App-SpaM: phylogenetic placement of short reads without sequence alignment. BIOINFORMATICS ADVANCES 2021;1:vbab027. [PMID: 36700102 PMCID: PMC9710606 DOI: 10.1093/bioadv/vbab027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 09/27/2021] [Accepted: 10/11/2021] [Indexed: 01/28/2023]

Chafin TK, Regmi B, Douglas MR, Edds DR, Wangchuk K, Dorji S, Norbu P, Norbu S, Changlu C, Khanal GP, Tshering S, Douglas ME. Parallel introgression, not recurrent emergence, explains apparent elevational ecotypes of polyploid Himalayan snowtrout. ROYAL SOCIETY OPEN SCIENCE 2021;8:210727. [PMID: 34729207 PMCID: PMC8548808 DOI: 10.1098/rsos.210727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 10/01/2021] [Indexed: 06/13/2023]

Affiliation(s)

Tyler K. Chafin Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA Department of Ecology and Evolutionary Biology, University of Colorado, Boulder 80309, USA
Binod Regmi Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA National Institute of Arthritis, Musculoskeletal and Skin Diseases (NIAMS), National Institutes of Health, Bethesda, MD 20892, USA
Marlis R. Douglas Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
David R. Edds Department of Biological Sciences, Emporia State University, Emporia, KS 66801, USA
Karma Wangchuk Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Sonam Dorji National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Pema Norbu National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Sangay Norbu National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Changlu Changlu National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Gopal Prasad Khanal National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Singye Tshering National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Michael E. Douglas Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA

Collapse

Rachtman E, Bafna V, Mirarab S. CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genom Bioinform 2021;3:lqab071. [PMID: 34377979 PMCID: PMC8340999 DOI: 10.1093/nargab/lqab071] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 12/27/2022] Open

Linard B, Romashchenko N, Pardi F, Rivals E. PEWO: a collection of workflows to benchmark phylogenetic placement. Bioinformatics 2021;36:5264-5266. [PMID: 32697844 DOI: 10.1093/bioinformatics/btaa657] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 07/10/2020] [Accepted: 07/16/2020] [Indexed: 11/13/2022] Open

Matsumoto H, Mimori T, Fukunaga T. Novel metric for hyperbolic phylogenetic tree embeddings. Biol Methods Protoc 2021;6:bpab006. [PMID: 33928190 PMCID: PMC8058397 DOI: 10.1093/biomethods/bpab006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 03/19/2021] [Accepted: 03/23/2021] [Indexed: 01/09/2023] Open

Smirnov V, Warnow T. Phylogeny Estimation Given Sequence Length Heterogeneity. Syst Biol 2020;70:268-282. [PMID: 32692823 PMCID: PMC7875441 DOI: 10.1093/sysbio/syaa058] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 07/14/2020] [Accepted: 07/15/2020] [Indexed: 12/21/2022] Open

Bhattacharjee A, Bayzid MS. Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices. BMC Genomics 2020;21:497. [PMID: 32689946 PMCID: PMC7370488 DOI: 10.1186/s12864-020-06892-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 07/07/2020] [Indexed: 02/08/2023] Open

Bohmann K, Mirarab S, Bafna V, Gilbert MTP. Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification. Mol Ecol 2020;29:2521-2534. [PMID: 32542933 PMCID: PMC7496323 DOI: 10.1111/mec.15507] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 06/03/2020] [Accepted: 06/05/2020] [Indexed: 02/06/2023]

Balaban M, Mirarab S. Phylogenetic double placement of mixed samples. Bioinformatics 2020;36:i335-i343. [PMID: 32657414 PMCID: PMC7355250 DOI: 10.1093/bioinformatics/btaa489] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Abstract

MOTIVATION

Consider a simple computational problem. The inputs are (i) the set of mixed reads generated from a sample that combines two organisms and (ii) separate sets of reads for several reference genomes of known origins. The goal is to find the two organisms that constitute the mixed sample. When constituents are absent from the reference set, we seek to phylogenetically position them with respect to the underlying tree of the reference species. This simple yet fundamental problem (which we call phylogenetic double-placement) has enjoyed surprisingly little attention in the literature. As genome skimming (low-pass sequencing of genomes at low coverage, precluding assembly) becomes more prevalent, this problem finds wide-ranging applications in areas as varied as biodiversity research, food production and provenance, and evolutionary reconstruction.

RESULTS

We introduce a model that relates distances between a mixed sample and reference species to the distances between constituents and reference species. Our model is based on Jaccard indices computed between each sample represented as k-mer sets. The model, built on several assumptions and approximations, allows us to formalize the phylogenetic double-placement problem as a non-convex optimization problem that decomposes mixture distances and performs phylogenetic placement simultaneously. Using a variety of techniques, we are able to solve this optimization problem numerically. We test the resulting method, called MIxed Sample Analysis tool (MISA), on a varied set of simulated and biological datasets. Despite all the assumptions used, the method performs remarkably well in practice.

AVAILABILITY AND IMPLEMENTATION

The software and data are available at https://github.com/balabanmetin/misa and https://github.com/balabanmetin/misa-data.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Czech L, Barbera P, Stamatakis A. Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data. Bioinformatics 2020;36:3263-3265. [PMID: 32016344 PMCID: PMC7214027 DOI: 10.1093/bioinformatics/btaa070] [Citation(s) in RCA: 133] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 01/22/2020] [Accepted: 01/28/2020] [Indexed: 11/14/2022] Open

Röhling S, Linne A, Schellhorn J, Hosseini M, Dencker T, Morgenstern B. The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances. PLoS One 2020;15:e0228070. [PMID: 32040534 PMCID: PMC7010260 DOI: 10.1371/journal.pone.0228070] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 01/08/2020] [Indexed: 12/14/2022] Open

Rachtman E, Balaban M, Bafna V, Mirarab S. The impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters. Mol Ecol Resour 2020;20. [PMID: 31943790 DOI: 10.1111/1755-0998.13135] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 12/22/2019] [Accepted: 01/05/2020] [Indexed: 11/27/2022]

Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage. BMC Bioinformatics 2019;20:638. [PMID: 31842735 PMCID: PMC6916211 DOI: 10.1186/s12859-019-3205-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open