Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sarmashghi S, Bohmann K, P. Gilbert MT, Bafna V, Mirarab S. Skmer: assembly-free and alignment-free sample identification using genome skims. Genome Biol 2019;20:34. [PMID: 30760303 PMCID: PMC6374904 DOI: 10.1186/s13059-019-1632-4] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 01/16/2019] [Indexed: 01/10/2023] Open

For:	Sarmashghi S, Bohmann K, P. Gilbert MT, Bafna V, Mirarab S. Skmer: assembly-free and alignment-free sample identification using genome skims. Genome Biol 2019;20:34. [PMID: 30760303 PMCID: PMC6374904 DOI: 10.1186/s13059-019-1632-4] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 01/16/2019] [Indexed: 01/10/2023] Open

Number

Cited by Other Article(s)

Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024;23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open

Wang B, Jin Y, Hu M, Zhao Y, Wang X, Yue J, Ren H. Detecting genetic gain and loss events in terms of protein domain: Method and implementation. Heliyon 2024;10:e32103. [PMID: 38867972 PMCID: PMC11168390 DOI: 10.1016/j.heliyon.2024.e32103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/08/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024] Open

Nagy NA, Tóth GE, Kurucz K, Kemenesi G, Laczkó L. The updated genome of the Hungarian population of Aedes koreicus. Sci Rep 2024;14:7545. [PMID: 38555322 PMCID: PMC10981705 DOI: 10.1038/s41598-024-58096-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/25/2024] [Indexed: 04/02/2024] Open

Wang F, Wang Y, Zeng X, Zhang S, Yu J, Li D, Zhang X. MIKE: an ultrafast, assembly-, and alignment-free approach for phylogenetic tree construction. Bioinformatics 2024;40:btae154. [PMID: 38547397 PMCID: PMC10990684 DOI: 10.1093/bioinformatics/btae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/06/2024] [Indexed: 04/05/2024] Open

Affiliation(s)

Fang Wang College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, Shanxi 030024, China National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
Yibin Wang National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
Xiaofei Zeng Department of Human Cell Biology and Genetics, Joint Laboratory of Guangdong-Hong Kong Universities for Vascular Homeostasis and Diseases, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 508055, China
Shengcheng Zhang National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
Jiaxin Yu National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
Dongxi Li College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, Shanxi 030024, China
Xingtan Zhang National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China

Collapse

Mirarab S, Bafna V. Analyses of Nuclear Reads Obtained Using Genome Skimming. Methods Mol Biol 2024;2744:247-265. [PMID: 38683324 DOI: 10.1007/978-1-0716-3581-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]

Shaw J, Yu YW. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nat Methods 2023;20:1661-1665. [PMID: 37735570 PMCID: PMC10630134 DOI: 10.1038/s41592-023-02018-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 08/22/2023] [Indexed: 09/23/2023]

Fruzangohar M, Moolhuijzen P, Bakaj N, Taylor J. CoreDetector: a flexible and efficient program for core-genome alignment of evolutionary diverse genomes. Bioinformatics 2023;39:btad628. [PMID: 37878789 PMCID: PMC10663985 DOI: 10.1093/bioinformatics/btad628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 09/20/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open

Bandaranayake PCG, Naranpanawa N, Chandrasekara CHWMRB, Samarakoon H, Lokuge S, Jayasundara S, Bandaranayake AU, Pushpakumara DKNG, Wijesundara DSA. Chloroplast genome, nuclear ITS regions, mitogenome regions, and Skmer analysis resolved the genetic relationship among Cinnamomum species in Sri Lanka. PLoS One 2023;18:e0291763. [PMID: 37729154 PMCID: PMC10511092 DOI: 10.1371/journal.pone.0291763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 09/05/2023] [Indexed: 09/22/2023] Open

Abstract

Cinnamomum species have gained worldwide attention because of their economic benefits. Among them, C. verum (synonymous with C. zeylanicum Blume), commonly known as Ceylon Cinnamon or True Cinnamon is mainly produced in Sri Lanka. In addition, Sri Lanka is home to seven endemic wild cinnamon species, C. capparu-coronde, C. citriodorum, C. dubium, C. litseifolium, C. ovalifolium, C. rivulorum and C. sinharajaense. Proper identification and genetic characterization are fundamental for the conservation and commercialization of these species. While some species can be identified based on distinct morphological or chemical traits, others cannot be identified easily morphologically or chemically. The DNA barcoding using rbcL, matK, and trnH-psbA regions could not also resolve the identification of Cinnamomum species in Sri Lanka. Therefore, we generated Illumina Hiseq data of about 20x coverage for each identified species and a C. verum sample (India) and assembled the chloroplast genome, nuclear ITS regions, and several mitochondrial genes, and conducted Skmer analysis. Chloroplast genomes of all eight species were assembled using a seed-based method.According to the Bayesian phylogenomic tree constructed with the complete chloroplast genomes, the C. verum (Sri Lanka) is sister to previously sequenced C. verum (NC_035236.1, KY635878.1), C. dubium and C. rivulorum. The C. verum sample from India is sister to C. litseifolium and C. ovalifolium. According to the ITS regions studied, C. verum (Sri Lanka) is sister to C. verum (NC_035236.1), C. dubium and C. rivulorum. Cinnamomum verum (India) shares an identical ITS region with C. ovalifolium, C. litseifolium, C. citriodorum, and C. capparu-coronde. According to the Skmer analysis C. verum (Sri Lanka) is sister to C. dubium and C. rivulorum, whereas C. verum (India) is sister to C. ovalifolium, and C. litseifolium. The chloroplast gene ycf1 was identified as a chloroplast barcode for the identification of Cinnamomum species. We identified an 18 bp indel region in the ycf1 gene, that could differentiate C. verum (India) and C. verum (Sri Lanka) samples tested.

Collapse

Mo ZQ, Wang J, Möller M, Yang JB, Gao LM. Phylogenetic Relationships and Next-Generation Barcodes in the Genus Torreya Reveal a High Proportion of Misidentified Cultivated Plants. Int J Mol Sci 2023;24:13216. [PMID: 37686021 PMCID: PMC10487542 DOI: 10.3390/ijms241713216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 08/20/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023] Open

Abstract

Accurate species identification is key to conservation and phylogenetic inference. Living plant collections from botanical gardens/arboretum are important resources for the purpose of scientific research, but the proportion of cultivated plant misidentification are un-tested using DNA barcodes. Here, we assembled the next-generation barcode (complete plastid genome and complete nrDNA cistron) and mitochondrial genes from genome skimming data of Torreya species with multiple accessions for each species to test the species discrimination and the misidentification proportion of cultivated plants used in Torreya studies. A total of 38 accessions were included for analyses, representing all nine recognized species of genus Torreya. The plastid phylogeny showed that all 21 wild samples formed species-specific clades, except T. jiulongshanensis. Disregarding this putative hybrid, seven recognized species sampled here were successfully discriminated by the plastid genome. Only the T. nucifera accessions grouped into two grades. The species identification rate of the nrDNA cistron was 62.5%. The Skmer analysis based on nuclear reads from genome skims showed promise for species identification with seven species discriminated. The proportion of misidentified cultivated plants from arboreta/botanical gardens was relatively high with four accessions (23.5%) representing three species. Interspecific relationships within Torreya were fully resolved with maximum support by plastomes, where Torreya jackii was on the earliest diverging branch, though sister to T. grandis in the nrDNA cistron tree, suggesting that this is likely a hybrid species between T. grandis and an extinct Torreya ancestor lineage. The findings here provide quantitative insights into the usage of cultivated samples for phylogenetic study.

Collapse

Pezzini FF, Ferrari G, Forrest LL, Hart ML, Nishii K, Kidner CA. Target capture and genome skimming for plant diversity studies. APPLICATIONS IN PLANT SCIENCES 2023;11:e11537. [PMID: 37601316 PMCID: PMC10439825 DOI: 10.1002/aps3.11537] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 06/16/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023]

Shaw J, Yu YW. Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic. Genome Res 2023;33:1175-1187. [PMID: 36990779 PMCID: PMC10538486 DOI: 10.1101/gr.277637.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 03/16/2023] [Indexed: 03/31/2023]

Pouchon C, Boluda CG. REFMAKER: make your own reference to target nuclear loci in low coverage genome skimming libraries. Phylogenomic application in Sapotaceae. Mol Phylogenet Evol 2023:107826. [PMID: 37257798 DOI: 10.1016/j.ympev.2023.107826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 04/24/2023] [Accepted: 05/25/2023] [Indexed: 06/02/2023]

Paula DP, Andow DA. DNA High-Throughput Sequencing for Arthropod Gut Content Analysis to Evaluate Effectiveness and Safety of Biological Control Agents. NEOTROPICAL ENTOMOLOGY 2023;52:302-332. [PMID: 36478343 DOI: 10.1007/s13744-022-01011-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 11/20/2022] [Indexed: 06/17/2023]

Raiyemo DA, Bobadilla LK, Tranel PJ. Genomic profiling of dioecious Amaranthus species provides novel insights into species relatedness and sex genes. BMC Biol 2023;21:37. [PMID: 36804015 PMCID: PMC9940365 DOI: 10.1186/s12915-023-01539-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 02/08/2023] [Indexed: 02/21/2023] Open

Abstract

BACKGROUND

Amaranthus L. is a diverse genus consisting of domesticated, weedy, and non-invasive species distributed around the world. Nine species are dioecious, of which Amaranthus palmeri S. Watson and Amaranthus tuberculatus (Moq.) J.D. Sauer are troublesome weeds of agronomic crops in the USA and elsewhere. Shallow relationships among the dioecious Amaranthus species and the conservation of candidate genes within previously identified A. palmeri and A. tuberculatus male-specific regions of the Y (MSYs) in other dioecious species are poorly understood. In this study, seven genomes of dioecious amaranths were obtained by paired-end short-read sequencing and combined with short reads of seventeen species in the family Amaranthaceae from NCBI database. The species were phylogenomically analyzed to understand their relatedness. Genome characteristics for the dioecious species were evaluated and coverage analysis was used to investigate the conservation of sequences within the MSY regions.

RESULTS

We provide genome size, heterozygosity, and ploidy level inference for seven newly sequenced dioecious Amaranthus species and two additional dioecious species from the NCBI database. We report a pattern of transposable element proliferation in the species, in which seven species had more Ty3 elements than copia elements while A. palmeri and A. watsonii had more copia elements than Ty3 elements, similar to the TE pattern in some monoecious amaranths. Using a Mash-based phylogenomic analysis, we accurately recovered taxonomic relationships among the dioecious Amaranthus species that were previously identified based on comparative morphology. Coverage analysis revealed eleven candidate gene models within the A. palmeri MSY region with male-enriched coverages, as well as regions on scaffold 19 with female-enriched coverage, based on A. watsonii read alignments. A previously reported FLOWERING LOCUS T (FT) within A. tuberculatus MSY contig was also found to exhibit male-enriched coverages for three species closely related to A. tuberculatus but not for A. watsonii reads. Additional characterization of the A. palmeri MSY region revealed that 78% of the region is made of repetitive elements, typical of a sex determination region with reduced recombination.

CONCLUSIONS

The results of this study further increase our understanding of the relationships among the dioecious species of the Amaranthus genus as well as revealed genes with potential roles in sex function in the species.

Collapse

Anjum N, Nabil RL, Rafi RI, Bayzid MS, Rahman MS. CD-MAWS: An Alignment-Free Phylogeny Estimation Method Using Cosine Distance on Minimal Absent Word Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:196-205. [PMID: 34928803 DOI: 10.1109/tcbb.2021.3136792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Rachtman E, Sarmashghi S, Bafna V, Mirarab S. Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling. Cell Syst 2022;13:817-829.e3. [PMID: 36265468 PMCID: PMC9589918 DOI: 10.1016/j.cels.2022.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 03/14/2022] [Accepted: 06/28/2022] [Indexed: 01/26/2023]

Balaban M, Bristy NA, Faisal A, Bayzid MS, Mirarab S. Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. BIOINFORMATICS ADVANCES 2022;2:vbac055. [PMID: 35992043 PMCID: PMC9383262 DOI: 10.1093/bioadv/vbac055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/09/2022] [Indexed: 01/27/2023]

Key Words Collapse

MESH Headings Collapse

Grants Collapse

Affiliation(s)
Metin Balaban
Nishat Anjum Bristy
Ahnaf Faisal
Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
Md Shamsuzzoha Bayzid
Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
Siavash Mirarab
To whom correspondence should be addressed.
Collapse

Paula DP, Timbó RV, Togawa RC, Vogler AP, Andow DA. Quantitative prey species detection in predator guts across multiple trophic levels by mapping unassembled shotgun reads. Mol Ecol Resour 2022;23:64-80. [DOI: 10.1111/1755-0998.13690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 06/11/2022] [Accepted: 07/05/2022] [Indexed: 11/29/2022]

Xu T, Kong L, Li Q. Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case. Genes (Basel) 2022;13:genes13071192. [PMID: 35885975 PMCID: PMC9318368 DOI: 10.3390/genes13071192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 06/26/2022] [Accepted: 06/28/2022] [Indexed: 02/05/2023] Open

Schmidt A, Schneider C, Decker P, Hohberg K, Römbke J, Lehmitz R, Bálint M. Shotgun metagenomics of soil invertebrate communities reflects taxonomy, biomass, and reference genome properties. Ecol Evol 2022;12:e8991. [PMID: 35784064 PMCID: PMC9170594 DOI: 10.1002/ece3.8991] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 05/11/2022] [Accepted: 05/17/2022] [Indexed: 12/03/2022] Open

Abstract

Metagenomics - shotgun sequencing of all DNA fragments from a community DNA extract - is routinely used to describe the composition, structure, and function of microorganism communities. Advances in DNA sequencing and the availability of genome databases increasingly allow the use of shotgun metagenomics on eukaryotic communities. Metagenomics offers major advances in the recovery of biomass relationships in a sample, in comparison to taxonomic marker gene-based approaches (metabarcoding). However, little is known about the factors which influence metagenomics data from eukaryotic communities, such as differences among organism groups, the properties of reference genomes, and genome assemblies.We evaluated how shotgun metagenomics records composition and biomass in artificial soil invertebrate communities at different sequencing efforts. We generated mock communities of controlled biomass ratios from 28 species from all major soil mesofauna groups: mites, springtails, nematodes, tardigrades, and potworms. We shotgun sequenced these communities and taxonomically assigned them with a database of over 270 soil invertebrate genomes.We recovered over 95% of the species, and observed relatively high false-positive detection rates. We found strong differences in reads assigned to different taxa, with some groups (e.g., springtails) consistently attracting more hits than others (e.g., enchytraeids). Original biomass could be predicted from read counts after considering these taxon-specific differences. Species with larger genomes, and with more complete assemblies, consistently attracted more reads than species with smaller genomes. The GC content of the genome assemblies had no effect on the biomass-read relationships. Results were similar among different sequencing efforts.The results show considerable differences in taxon recovery and taxon specificity of biomass recovery from metagenomic sequence data. The properties of reference genomes and genome assemblies also influence biomass recovery, and they should be considered in metagenomic studies of eukaryotes. We show that low- and high-sequencing efforts yield similar results, suggesting high cost-efficiency of metagenomics for eukaryotic communities. We provide a brief roadmap for investigating factors which influence metagenomics-based eukaryotic community reconstructions. Understanding these factors is timely as accessibility of DNA sequencing and momentum for reference genomes projects show a future where the taxonomic assignment of DNA from any community sample becomes a reality.

Collapse

Belbasi M, Blanca A, Harris RS, Koslicki D, Medvedev P. The minimizer Jaccard estimator is biased and inconsistent. Bioinformatics 2022;38:i169-i176. [PMID: 35758786 PMCID: PMC9235516 DOI: 10.1093/bioinformatics/btac244] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Liu S, Koslicki D. CMash: fast, multi-resolution estimation of k-mer-based Jaccard and containment indices. Bioinformatics 2022;38:i28-i35. [PMID: 35758788 PMCID: PMC9235470 DOI: 10.1093/bioinformatics/btac237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Cay SB, Cinar YU, Kuralay SC, Inal B, Zararsiz G, Ciftci A, Mollman R, Obut O, Eldem V, Bakir Y, Erol O. Genome skimming approach reveals the gene arrangements in the chloroplast genomes of the highly endangered Crocus L. species: Crocus istanbulensis (B.Mathew) Rukšāns. PLoS One 2022;17:e0269747. [PMID: 35704623 PMCID: PMC9200356 DOI: 10.1371/journal.pone.0269747] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 05/27/2022] [Indexed: 11/19/2022] Open

Li X, Wang X, Huang R, Stucky A, Chen X, Sun L, Wen Q, Zeng Y, Fletcher H, Wang C, Xu Y, Cao H, Sun F, Li SC, Zhang X, Zhong JF. The Machine-Learning-Mediated Interface of Microbiome and Genetic Risk Stratification in Neuroblastoma Reveals Molecular Pathways Related to Patient Survival. Cancers (Basel) 2022;14:cancers14122874. [PMID: 35740540 PMCID: PMC9220810 DOI: 10.3390/cancers14122874] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 02/01/2023] Open

Abstract

Simple Summary

Neuroblastoma is a highly heterogeneous malignancy with a wide range of outcomes from spontaneous regression to fatal chemoresistant disease, as currently treated according to the risk stratification of the Children’s Oncology Group (COG), resulting in some high COG risk patients receiving excessive treatment, due to lacking predictors for treatment response. Here, we sought to complement COG risk classification by using the tumor intracellular microbiome, which is part of the tumor’s molecular signature. We determine that an intra-tumor microbial gene abundance score, namely M-score, separates the high COG-risk patients into two subpopulations (M_high and M_low) with higher accuracy in risk stratification than the current COG risk assessment, thus sparing a subset of high COG-risk patients from being subjected to traditional high-risk therapies.

Abstract

Currently, most neuroblastoma patients are treated according to the Children’s Oncology Group (COG) risk group assignment; however, neuroblastoma’s heterogeneity renders only a few predictors for treatment response, resulting in excessive treatment. Here, we sought to couple COG risk classification with tumor intracellular microbiome, which is part of the molecular signature of a tumor. We determine that an intra-tumor microbial gene abundance score, namely M-score, separates the high COG-risk patients into two subpopulations (M_high and M_low) with higher accuracy in risk stratification than the current COG risk assessment, thus sparing a subset of high COG-risk patients from being subjected to traditional high-risk therapies. Mechanistically, the classification power of M-scores implies the effect of CREB over-activation, which may influence the critical genes involved in cellular proliferation, anti-apoptosis, and angiogenesis, affecting tumor cell proliferation survival and metastasis. Thus, intracellular microbiota abundance in neuroblastoma regulates intracellular signals to affect patients’ survival.

Collapse

Affiliation(s)

Xin Li Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
Xiaoqi Wang Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
Ruihao Huang Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
Andres Stucky Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
Xuelian Chen Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
Lan Sun Department of Oncology, Bishan Hospital of Chongqing Medical University, the People’s Hospital of Bishan District, Chongqing 400037, China;
Qin Wen Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
Yunjing Zeng Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.)
Hansel Fletcher Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
Charles Wang Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.)
Yi Xu Divisions of Hematology and Oncology and Regenerative Medicine, Department of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (Y.X.); (H.C.) Cancer Center of Loma Linda University, Loma Linda, CA 92350, USA
Huynh Cao Divisions of Hematology and Oncology and Regenerative Medicine, Department of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (Y.X.); (H.C.) Cancer Center of Loma Linda University, Loma Linda, CA 92350, USA
Fengzhu Sun Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA 90089, USA;
Shengwen Calvin Li CHOC Children’s Research Institute, Children’s Hospital of Orange County (CHOC), 1201 La Veta Ave., Orange, CA 92868-3874, USA Department of Neurology, University of California—Irvine School of Medicine, 200 S. Manchester Ave. Ste. 206, Orange, CA 92868, USA Correspondence: (S.C.L.); (X.Z.); (J.F.Z.)
Xi Zhang Medical Center of Hematology, Xinqiao Hospital, State Key Laboratory of Trauma, Burn and Combined Injury, Army Medical University, Chongqing 400037, China; (X.W.); (R.H.); (Q.W.); (Y.Z.) Correspondence: (S.C.L.); (X.Z.); (J.F.Z.)
Jiang F. Zhong Department of Basic Science, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA; (X.L.); (A.S.); (X.C.); (H.F.); (C.W.) Cancer Center of Loma Linda University, Loma Linda, CA 92350, USA Correspondence: (S.C.L.); (X.Z.); (J.F.Z.)

Collapse

Javadzadeh S, Rajkumar U, Nguyen N, Sarmashghi S, Luebeck J, Shang J, Bafna V. FastViFi: Fast and accurate detection of (Hybrid) Viral DNA and RNA. NAR Genom Bioinform 2022;4:lqac032. [PMID: 35493723 PMCID: PMC9041341 DOI: 10.1093/nargab/lqac032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 03/04/2022] [Accepted: 03/06/2022] [Indexed: 11/13/2022] Open

Paula DP, Barros SKA, Pitta RM, Barreto MR, Togawa RC, Andow DA. Metabarcoding versus mapping unassembled shotgun reads for identification of prey consumed by arthropod epigeal predators. Gigascience 2022;11:6554098. [PMID: 35333301 PMCID: PMC8952265 DOI: 10.1093/gigascience/giac020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 12/07/2021] [Accepted: 02/09/2022] [Indexed: 12/19/2022] Open

Van Dam AR, Covas Orizondo JO, Lam AW, McKenna DD, Van Dam MH. Metagenomic clustering reveals microbial contamination as an essential consideration in ultraconserved element design for phylogenomics with insect museum specimens. Ecol Evol 2022;12:e8625. [PMID: 35342556 PMCID: PMC8932080 DOI: 10.1002/ece3.8625] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 01/03/2022] [Accepted: 01/17/2022] [Indexed: 11/30/2022] Open

Blanca A, Harris RS, Koslicki D, Medvedev P. The Statistics of k-mers from a Sequence Undergoing a Simple Mutation Process Without Spurious Matches. J Comput Biol 2022;29:155-168. [PMID: 35108101 DOI: 10.1089/cmb.2021.0431] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022;2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Sarmashghi S, Balaban M, Rachtman E, Touri B, Mirarab S, Bafna V. Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT. PLoS Comput Biol 2021;17:e1009449. [PMID: 34780468 PMCID: PMC8629397 DOI: 10.1371/journal.pcbi.1009449] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 11/29/2021] [Accepted: 09/13/2021] [Indexed: 01/26/2023] Open

Abstract

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had 2.2% error in length estimation compared to 27% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_shahab-2Dsarmashghi_RESPECT.git&d=DwIGAw&c=-35OiAkTchMrZOngvJPOeA&r=ZozViWvD1E8PorCkfwYKYQMVKFoEcqLFm4Tg49XnPcA&m=f-xS8GMHKckknkc7Xpp8FJYw_ltUwz5frOw1a5pJ81EpdTOK8xhbYmrN4ZxniM96&s=717o8hLR1JmHFpRPSWG6xdUQTikyUjicjkipjFsKG4w&e=.

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome skims) could be transformative for genomic ecology. Analyzing genome skims, mostly based on statistics of small oligomers, remains challenging, but recent results have shown the advantage of this approach for the identification and phylogenetic placement of eukaryotic species. In this paper, we present a method, RESPECT, to estimate genomic properties such as genome length and repetitiveness from low-coverage genome skims. We trained RESPECT using assembled genomes and tested it on low-coverage simulated and real reads. Benchmarking results reveal that RESPECT has excellent accuracy in estimating the genome length compared to other methods, and can provide critical information regarding the repeat structure of the genome.

Collapse

Costa L, Marques A, Buddenhagen C, Thomas WW, Huettel B, Schubert V, Dodsworth S, Houben A, Souza G, Pedrosa-Harand A. Aiming off the target: recycling target capture sequencing reads for investigating repetitive DNA. ANNALS OF BOTANY 2021;128:835-848. [PMID: 34050647 PMCID: PMC8577205 DOI: 10.1093/aob/mcab063] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 05/26/2021] [Indexed: 05/28/2023]

Blanke M, Morgenstern B. App-SpaM: phylogenetic placement of short reads without sequence alignment. BIOINFORMATICS ADVANCES 2021;1:vbab027. [PMID: 36700102 PMCID: PMC9710606 DOI: 10.1093/bioadv/vbab027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 09/27/2021] [Accepted: 10/11/2021] [Indexed: 01/28/2023]

Oliveira MAS, Nunes T, Dos Santos MA, Ferreira Gomes D, Costa I, Van-Lume B, Marques Da Silva SS, Oliveira RS, Simon MF, Lima GSA, Gissi DS, Almeida CCDS, Souza G, Marques A. High-Throughput Genomic Data Reveal Complex Phylogenetic Relationships in Stylosanthes Sw (Leguminosae). Front Genet 2021;12:727314. [PMID: 34630521 PMCID: PMC8495327 DOI: 10.3389/fgene.2021.727314] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 09/08/2021] [Indexed: 11/22/2022] Open

Abstract

Allopolyploidy is widely present across plant lineages. Though estimating the correct phylogenetic relationships and origin of allopolyploids may sometimes become a hard task. In the genus Stylosanthes Sw. (Leguminosae), an important legume crop, allopolyploidy is a key speciation force. This makes difficult adequate species recognition and breeding efforts on the genus. Based on comparative analysis of nine high-throughput sequencing (HTS) samples, including three allopolyploids (S. capitata Vogel cv. “Campo Grande,” S. capitata “RS024” and S. scabra Vogel) and six diploids (S. hamata Taub, S. viscosa (L.) Sw., S. macrocephala M. B. Ferreira and Sousa Costa, S. guianensis (Aubl.) Sw., S. pilosa M. B. Ferreira and Sousa Costa and S. seabrana B. L. Maass & 't Mannetje) we provide a working pipeline to identify organelle and nuclear genome signatures that allowed us to trace the origin and parental genome recognition of allopolyploids. First, organelle genomes were de novo assembled and used to identify maternal genome donors by alignment-based phylogenies and synteny analysis. Second, nuclear-derived reads were subjected to repetitive DNA identification with RepeatExplorer2. Identified repeats were compared based on abundance and presence on diploids in relation to allopolyploids by comparative repeat analysis. Third, reads were extracted and grouped based on the following groups: chloroplast, mitochondrial, satellite DNA, ribosomal DNA, repeat clustered- and total genomic reads. These sets of reads were then subjected to alignment and assembly free phylogenetic analyses and were compared to classical alignment-based phylogenetic methods. Comparative analysis of shared and unique satellite repeats also allowed the tracing of allopolyploid origin in Stylosanthes, especially those with high abundance such as the StyloSat1 in the Scabra complex. This satellite was in situ mapped in the proximal region of the chromosomes and made it possible to identify its previously proposed parents. Hence, with simple genome skimming data we were able to provide evidence for the recognition of parental genomes and understand genome evolution of two Stylosanthes allopolyploids.

Collapse

Chafin TK, Regmi B, Douglas MR, Edds DR, Wangchuk K, Dorji S, Norbu P, Norbu S, Changlu C, Khanal GP, Tshering S, Douglas ME. Parallel introgression, not recurrent emergence, explains apparent elevational ecotypes of polyploid Himalayan snowtrout. ROYAL SOCIETY OPEN SCIENCE 2021;8:210727. [PMID: 34729207 PMCID: PMC8548808 DOI: 10.1098/rsos.210727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 10/01/2021] [Indexed: 06/13/2023]

Affiliation(s)

Tyler K. Chafin Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA Department of Ecology and Evolutionary Biology, University of Colorado, Boulder 80309, USA
Binod Regmi Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA National Institute of Arthritis, Musculoskeletal and Skin Diseases (NIAMS), National Institutes of Health, Bethesda, MD 20892, USA
Marlis R. Douglas Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
David R. Edds Department of Biological Sciences, Emporia State University, Emporia, KS 66801, USA
Karma Wangchuk Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Sonam Dorji National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Pema Norbu National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Sangay Norbu National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Changlu Changlu National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Gopal Prasad Khanal National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Singye Tshering National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
Michael E. Douglas Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA

Collapse

Agrawal N, Gupta M, Atri C, Akhatar J, Kumar S, Heslop-Harrison PJS, Banga SS. Anchoring alien chromosome segment substitutions bearing gene(s) for resistance to mustard aphid in Brassica juncea-B. fruticulosa introgression lines and their possible disruption through gamma irradiation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021;134:3209-3224. [PMID: 34160642 DOI: 10.1007/s00122-021-03886-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 06/08/2021] [Indexed: 05/18/2023]

Abstract

KEY MESSAGE

Heavy doses of gamma irradiation can reduce linkage drag by disrupting large sized alien translocations and promoting exchanges between crop and wild genomes. Resistance to mustard aphid (Lipaphis erysimi) infestation was significantly improved in Brassica juncea through B. juncea-B. fruticulosa introgression. However, linkage drag caused by introgressed chromatin fragments has so far prevented the deployment of this resistance source in commercial cultivars. We investigated the patterns of donor chromatin segment substitutions in the introgression lines (ILs) through genomic in situ hybridization (GISH) coupled with B. juncea chromosome-specific oligonucleotide probes. These allowed identification of large chromosome translocations from B. fruticulosa in the terminal regions of chromosomes A05, B02, B03 and B04 in three founder ILs (AD-64, 101 and 104). Only AD-101 carried an additional translocation at the sub-terminal to intercalary position in both homologues of chromosome A01. We validated these translocations with a reciprocal blast hit analysis using shotgun sequencing of three ILs and species-specific contigs/scaffolds (kb sized) from a de novo assembly of B. fruticulosa. Alien segment substitution on chromosome A05 could not be validated. Current studies also endeavoured to break linkage drag by exposing seeds to a heavy dose (200kR) of gamma radiation. Reduction in the size of introgressed chromatin fragments was observed in many M3 plants. There was a complete loss of the alien chromosome fragment in one instance. A few M3 plants with novel patterns of chromosome segment substitutions displayed improved agronomic performance coupled with resistance to mustard aphid. SNPs in such genomic spaces should aid the development of markers to track introgressed DNA and allow application in plant breeding.

Collapse

Rachtman E, Bafna V, Mirarab S. CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genom Bioinform 2021;3:lqab071. [PMID: 34377979 PMCID: PMC8340999 DOI: 10.1093/nargab/lqab071] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 12/27/2022] Open

Lu YY, Bai J, Wang Y, Wang Y, Sun F. CRAFT: Compact genome Representation toward large-scale Alignment-Free daTabase. Bioinformatics 2021;37:155-161. [PMID: 32766810 DOI: 10.1093/bioinformatics/btaa699] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/11/2020] [Accepted: 07/28/2020] [Indexed: 01/02/2023] Open

Sequence Comparison Without Alignment: The SpaM Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021;2231:121-134. [PMID: 33289890 DOI: 10.1007/978-1-0716-1036-7_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Girgis HZ, James BT, Luczak BB. Identity: rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models. NAR Genom Bioinform 2021;3:lqab001. [PMID: 33554117 PMCID: PMC7850047 DOI: 10.1093/nargab/lqab001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 12/07/2020] [Accepted: 01/08/2021] [Indexed: 11/12/2022] Open

Criscuolo A. On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference. F1000Res 2020;9:1309. [PMID: 33335719 PMCID: PMC7713896 DOI: 10.12688/f1000research.26930.1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/12/2020] [Indexed: 12/29/2022] Open

Klötzl F, Haubold B. Phylonium: fast estimation of evolutionary distances from large samples of similar genomes. Bioinformatics 2020;36:2040-2046. [PMID: 31790149 PMCID: PMC7141870 DOI: 10.1093/bioinformatics/btz903] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 11/01/2019] [Accepted: 11/28/2019] [Indexed: 11/13/2022] Open

Baharav TZ, Kamath GM, Tse DN, Shomorony I. Spectral Jaccard Similarity: A New Approach to Estimating Pairwise Sequence Alignments. PATTERNS 2020;1:100081. [PMID: 33205128 PMCID: PMC7660437 DOI: 10.1016/j.patter.2020.100081] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 06/09/2020] [Accepted: 07/03/2020] [Indexed: 01/02/2023]

Wang Y, Chen Q, Deng C, Zheng Y, Sun F. KmerGO: A Tool to Identify Group-Specific Sequences With k-mers. Front Microbiol 2020;11:2067. [PMID: 32983048 PMCID: PMC7477287 DOI: 10.3389/fmicb.2020.02067] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 08/06/2020] [Indexed: 01/24/2023] Open

Bohmann K, Mirarab S, Bafna V, Gilbert MTP. Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification. Mol Ecol 2020;29:2521-2534. [PMID: 32542933 PMCID: PMC7496323 DOI: 10.1111/mec.15507] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 06/03/2020] [Accepted: 06/05/2020] [Indexed: 02/06/2023]

Balaban M, Mirarab S. Phylogenetic double placement of mixed samples. Bioinformatics 2020;36:i335-i343. [PMID: 32657414 PMCID: PMC7355250 DOI: 10.1093/bioinformatics/btaa489] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Abstract

MOTIVATION

Consider a simple computational problem. The inputs are (i) the set of mixed reads generated from a sample that combines two organisms and (ii) separate sets of reads for several reference genomes of known origins. The goal is to find the two organisms that constitute the mixed sample. When constituents are absent from the reference set, we seek to phylogenetically position them with respect to the underlying tree of the reference species. This simple yet fundamental problem (which we call phylogenetic double-placement) has enjoyed surprisingly little attention in the literature. As genome skimming (low-pass sequencing of genomes at low coverage, precluding assembly) becomes more prevalent, this problem finds wide-ranging applications in areas as varied as biodiversity research, food production and provenance, and evolutionary reconstruction.

RESULTS

We introduce a model that relates distances between a mixed sample and reference species to the distances between constituents and reference species. Our model is based on Jaccard indices computed between each sample represented as k-mer sets. The model, built on several assumptions and approximations, allows us to formalize the phylogenetic double-placement problem as a non-convex optimization problem that decomposes mixture distances and performs phylogenetic placement simultaneously. Using a variety of techniques, we are able to solve this optimization problem numerically. We test the resulting method, called MIxed Sample Analysis tool (MISA), on a varied set of simulated and biological datasets. Despite all the assumptions used, the method performs remarkably well in practice.

AVAILABILITY AND IMPLEMENTATION

The software and data are available at https://github.com/balabanmetin/misa and https://github.com/balabanmetin/misa-data.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information Theory in Computational Biology: Where We Stand Today. ENTROPY (BASEL, SWITZERLAND) 2020;22:E627. [PMID: 33286399 PMCID: PMC7517167 DOI: 10.3390/e22060627] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 05/31/2020] [Accepted: 06/03/2020] [Indexed: 12/30/2022]

Balaban M, Sarmashghi S, Mirarab S. APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments. Syst Biol 2020;69:566-578. [PMID: 31545363 PMCID: PMC7164367 DOI: 10.1093/sysbio/syz063] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 09/05/2019] [Accepted: 09/10/2019] [Indexed: 11/14/2022] Open

Dencker T, Leimeister CA, Gerth M, Bleidorn C, Snir S, Morgenstern B. 'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees. NAR Genom Bioinform 2020;2:lqz013. [PMID: 33575565 PMCID: PMC7671388 DOI: 10.1093/nargab/lqz013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 07/31/2019] [Accepted: 10/13/2019] [Indexed: 02/03/2023] Open

Röhling S, Linne A, Schellhorn J, Hosseini M, Dencker T, Morgenstern B. The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances. PLoS One 2020;15:e0228070. [PMID: 32040534 PMCID: PMC7010260 DOI: 10.1371/journal.pone.0228070] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 01/08/2020] [Indexed: 12/14/2022] Open

Garrido-Sanz L, Senar MÀ, Piñol J. Estimation of the relative abundance of species in artificial mixtures of insects using low-coverage shotgun metagenomics. METABARCODING AND METAGENOMICS 2020. [DOI: 10.3897/mbmg.4.48281] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open