Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: James BT, Luczak BB, Girgis HZ. MeShClust: an intelligent tool for clustering DNA sequences. Nucleic Acids Res 2019;46:e83. [PMID: 29718317 PMCID: PMC6101578 DOI: 10.1093/nar/gky315] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 04/13/2018] [Indexed: 11/13/2022] Open

For:	James BT, Luczak BB, Girgis HZ. MeShClust: an intelligent tool for clustering DNA sequences. Nucleic Acids Res 2019;46:e83. [PMID: 29718317 PMCID: PMC6101578 DOI: 10.1093/nar/gky315] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 04/13/2018] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Ben Shabat D, Hadad A, Boruchovsky A, Yaakobi E. GradHC: highly reliable gradual hash-based clustering for DNA storage systems. BIOINFORMATICS (OXFORD, ENGLAND) 2024;40:btae274. [PMID: 38648049 DOI: 10.1093/bioinformatics/btae274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/27/2024] [Accepted: 04/17/2024] [Indexed: 04/25/2024]

Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024;43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open

Xu C, Li J, Song LY, Guo ZJ, Song SW, Zhang LD, Zheng HL. PlantC2U: deep learning of cross-species sequence landscapes predicts plastid C-to-U RNA editing in plants. JOURNAL OF EXPERIMENTAL BOTANY 2024;75:2266-2279. [PMID: 38190348 DOI: 10.1093/jxb/erae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 01/07/2024] [Indexed: 01/10/2024]

Wright E. Accurately clustering biological sequences in linear time by relatedness sorting. Nat Commun 2024;15:3047. [PMID: 38589369 PMCID: PMC11001989 DOI: 10.1038/s41467-024-47371-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 03/28/2024] [Indexed: 04/10/2024] Open

Alipour F, Holmes C, Lu YY, Hill KA, Kari L. Leveraging machine learning for taxonomic classification of emerging astroviruses. Front Mol Biosci 2024;10:1305506. [PMID: 38274100 PMCID: PMC10808839 DOI: 10.3389/fmolb.2023.1305506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 12/12/2023] [Indexed: 01/27/2024] Open

Abstract

Astroviruses are a family of genetically diverse viruses associated with disease in humans and birds with significant health effects and economic burdens. Astrovirus taxonomic classification includes two genera, Avastrovirus and Mamastrovirus. However, with next-generation sequencing, broader interspecies transmission has been observed necessitating a reexamination of the current host-based taxonomic classification approach. In this study, a novel taxonomic classification method is presented for emergent and as yet unclassified astroviruses, based on whole genome sequence k-mer composition in addition to host information. An optional component responsible for identifying recombinant sequences was added to the method's pipeline, to counteract the impact of genetic recombination on viral classification. The proposed three-pronged classification method consists of a supervised machine learning method, an unsupervised machine learning method, and the consideration of host species. Using this three-pronged approach, we propose genus labels for 191 as yet unclassified astrovirus genomes. Genus labels are also suggested for an additional eight as yet unclassified astrovirus genomes for which incompatibility was observed with the host species, suggesting cross-species infection. Lastly, our machine learning-based approach augmented by a principal component analysis (PCA) analysis provides evidence supporting the hypothesis of the existence of human astrovirus (HAstV) subgenus of the genus Mamastrovirus, and a goose astrovirus (GoAstV) subgenus of the genus Avastrovirus. Overall, this multipronged machine learning approach provides a fast, reliable, and scalable prediction method of taxonomic labels, able to keep pace with emerging viruses and the exponential increase in the output of modern genome sequencing technologies.

Collapse

Han R, Qi J, Xue Y, Sun X, Zhang F, Gao X, Li G. HycDemux: a hybrid unsupervised approach for accurate barcoded sample demultiplexing in nanopore sequencing. Genome Biol 2023;24:222. [PMID: 37798751 PMCID: PMC10552309 DOI: 10.1186/s13059-023-03053-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 09/08/2023] [Indexed: 10/07/2023] Open

Millan Arias P, Hill KA, Kari L. iDeLUCS: a deep learning interactive tool for alignment-free clustering of DNA sequences. Bioinformatics 2023;39:btad508. [PMID: 37589603 PMCID: PMC10483029 DOI: 10.1093/bioinformatics/btad508] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 07/18/2023] [Accepted: 08/16/2023] [Indexed: 08/18/2023] Open

Wei ZG, Chen X, Zhang XD, Zhang H, Fan XG, Gao HY, Liu F, Qian Y. Comparison of Methods for Biological Sequence Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:2874-2888. [PMID: 37028305 DOI: 10.1109/tcbb.2023.3253138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Wang P, Cao B, Ma T, Wang B, Zhang Q, Zheng P. DUHI: Dynamically updated hash index clustering method for DNA storage. Comput Biol Med 2023;164:107244. [PMID: 37453377 DOI: 10.1016/j.compbiomed.2023.107244] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/08/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]

Johnson MS, Venkataram S, Kryazhimskiy S. Best Practices in Designing, Sequencing, and Identifying Random DNA Barcodes. J Mol Evol 2023;91:263-280. [PMID: 36651964 PMCID: PMC10276077 DOI: 10.1007/s00239-022-10083-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 12/15/2022] [Indexed: 01/19/2023]

Xu X, Yin Z, Yan L, Zhang H, Xu B, Wei Y, Niu B, Schmidt B, Liu W. RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches. Genome Biol 2023;24:121. [PMID: 37198663 PMCID: PMC10190105 DOI: 10.1186/s13059-023-02961-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 05/05/2023] [Indexed: 05/19/2023] Open

Luan T, Muralidharan HS, Alshehri M, Mittra I, Pop M. SCRAPT: an iterative algorithm for clustering large 16S rRNA gene data sets. Nucleic Acids Res 2023;51:e46. [PMID: 36912074 PMCID: PMC10164572 DOI: 10.1093/nar/gkad158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 02/01/2023] [Accepted: 02/28/2023] [Indexed: 03/14/2023] Open

Rubio A, Sprang M, Garzón A, Moreno-Rodriguez A, Pachón-Ibáñez ME, Pachón J, Andrade-Navarro MA, Pérez-Pulido AJ. Analysis of bacterial pangenomes reduces CRISPR dark matter and reveals strong association between membranome and CRISPR-Cas systems. SCIENCE ADVANCES 2023;9:eadd8911. [PMID: 36961900 PMCID: PMC10038342 DOI: 10.1126/sciadv.add8911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 02/17/2023] [Indexed: 06/18/2023]

Neupane A, Chariker JH, Rouchka EC. Structural and Functional Classification of G-Quadruplex Families within the Human Genome. Genes (Basel) 2023;14:genes14030645. [PMID: 36980918 PMCID: PMC10048163 DOI: 10.3390/genes14030645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 02/22/2023] [Accepted: 03/02/2023] [Indexed: 03/08/2023] Open

Federated learning review: Fundamentals, enabling technologies, and future applications. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.103061] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Molo MS, White JB, Cornish V, Gell RM, Baars O, Singh R, Carbone MA, Isakeit T, Wise KA, Woloshuk CP, Bluhm BH, Horn BW, Heiniger RW, Carbone I. Asymmetrical lineage introgression and recombination in populations of Aspergillus flavus: Implications for biological control. PLoS One 2022;17:e0276556. [PMID: 36301851 PMCID: PMC9620740 DOI: 10.1371/journal.pone.0276556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 10/08/2022] [Indexed: 11/23/2022] Open

Affiliation(s)

Megan S. Molo Department of Entomology and Plant Pathology, Center for Integrated Fungal Research, North Carolina State University, Raleigh, NC, United States of America
James B. White Department of Entomology and Plant Pathology, Center for Integrated Fungal Research, North Carolina State University, Raleigh, NC, United States of America
Vicki Cornish Department of Entomology and Plant Pathology, Center for Integrated Fungal Research, North Carolina State University, Raleigh, NC, United States of America
Richard M. Gell Department of Entomology and Plant Pathology, Center for Integrated Fungal Research, North Carolina State University, Raleigh, NC, United States of America Program of Genetics, North Carolina State University, Raleigh, North Carolina, United States of America
Oliver Baars Department of Entomology and Plant Pathology, Center for Integrated Fungal Research, North Carolina State University, Raleigh, NC, United States of America
Rakhi Singh Department of Entomology and Plant Pathology, Center for Integrated Fungal Research, North Carolina State University, Raleigh, NC, United States of America
Mary Anna Carbone Center for Integrated Fungal Research and Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, United States of America
Thomas Isakeit Department of Plant Pathology and Microbiology, Texas AgriLife Extension Service, Texas A&M University, College Station, TX, United States of America
Kiersten A. Wise Department of Plant Pathology, University of Kentucky, Princeton, KY, United States of America
Charles P. Woloshuk Department of Plant Pathology and Botany, Purdue University, West Lafayette, IN, United States of America
Burton H. Bluhm University of Arkansas Division of Agriculture, Department of Entomology and Plant Pathology, Fayetteville, AR, United States of America
Bruce W. Horn United States Department of Agriculture, Agriculture Research Service, Dawson, GA, United States of America
Ron W. Heiniger Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, United States of America
Ignazio Carbone Department of Entomology and Plant Pathology, Center for Integrated Fungal Research, North Carolina State University, Raleigh, NC, United States of America Program of Genetics, North Carolina State University, Raleigh, North Carolina, United States of America * E-mail:

Collapse

Qu G, Yan Z, Wu H. Clover: tree structure-based efficient DNA clustering for DNA-based data storage. Brief Bioinform 2022;23:6668252. [PMID: 35975958 DOI: 10.1093/bib/bbac336] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/21/2022] [Accepted: 07/22/2022] [Indexed: 11/12/2022] Open

Swain MT, Vickers M. Interpreting alignment-free sequence comparison: what makes a score a good score? NAR Genom Bioinform 2022;4:lqac062. [PMID: 36071721 PMCID: PMC9442500 DOI: 10.1093/nargab/lqac062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 07/01/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open

Girgis HZ. MeShClust v3.0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores. BMC Genomics 2022;23:423. [PMID: 35668366 PMCID: PMC9171953 DOI: 10.1186/s12864-022-08619-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 05/11/2022] [Indexed: 11/22/2022] Open

Abstract

Background

Tools for accurately clustering biological sequences are among the most important tools in computational biology. Two pioneering tools for clustering sequences are CD-HIT and UCLUST, both of which are fast and consume reasonable amounts of memory; however, there is a big room for improvement in terms of cluster quality. Motivated by this opportunity for improving cluster quality, we applied the mean shift algorithm in MeShClust v1.0. The mean shift algorithm is an instance of unsupervised learning. Its strong theoretical foundation guarantees the convergence to the true cluster centers. Our implementation of the mean shift algorithm in MeShClust v1.0 was a step forward. In this work, we scale up the algorithm by adapting an out-of-core strategy while utilizing alignment-free identity scores in a new tool: MeShClust v3.0.

Results

We evaluated CD-HIT, MeShClust v1.0, MeShClust v3.0, and UCLUST on 22 synthetic sets and five real sets. These data sets were designed or selected for testing the tools in terms of scalability and different similarity levels among sequences comprising clusters. On the synthetic data sets, MeShClust v3.0 outperformed the related tools on all sets in terms of cluster quality. On two real data sets obtained from human microbiome and maize transposons, MeShClust v3.0 outperformed the related tools by wide margins, achieving 55%–300% improvement in cluster quality. On another set that includes degenerate viral sequences, MeShClust v3.0 came third. On two bacterial sets, MeShClust v3.0 was the only applicable tool because of the long sequences in these sets. MeShClust v3.0 requires more time and memory than the related tools; almost all personal computers at the time of this writing can accommodate such requirements. MeShClust v3.0 can estimate an important parameter that controls cluster membership with high accuracy.

Conclusions

These results demonstrate the high quality of clusters produced by MeShClust v3.0 and its ability to apply the mean shift algorithm to large data sets and long sequences. Because clustering tools are utilized in many studies, providing high-quality clusters will help with deriving accurate biological knowledge.

Supplementary Information

The online version contains supplementary material available at (10.1186/s12864-022-08619-0).

Collapse

Aunin E, Berriman M, Reid AJ. Characterising genome architectures using genome decomposition analysis. BMC Genomics 2022;23:398. [PMID: 35610562 PMCID: PMC9131526 DOI: 10.1186/s12864-022-08616-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 05/10/2022] [Indexed: 12/14/2022] Open

Kioukis A, Pourjam M, Neuhaus K, Lagkouvardos I. Taxonomy Informed Clustering, an Optimized Method for Purer and More Informative Clusters in Diversity Analysis and Microbiome Profiling. FRONTIERS IN BIOINFORMATICS 2022;2:864597. [PMID: 36304326 PMCID: PMC9580952 DOI: 10.3389/fbinf.2022.864597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open

Chiu JKH, Ong RTH. Clustering biological sequences with dynamic sequence similarity threshold. BMC Bioinformatics 2022;23:108. [PMID: 35354426 PMCID: PMC8969259 DOI: 10.1186/s12859-022-04643-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 03/02/2022] [Indexed: 11/10/2022] Open

Furuta Y, Miura F, Ichise T, Nakayama SMM, Ikenaka Y, Zorigt T, Tsujinouchi M, Ishizuka M, Ito T, Higashi H. A GCDGC-specific DNA (cytosine-5) methyltransferase that methylates the GCWGC sequence on both strands and the GCSGC sequence on one strand. PLoS One 2022;17:e0265225. [PMID: 35312710 PMCID: PMC8936443 DOI: 10.1371/journal.pone.0265225] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 02/24/2022] [Indexed: 11/18/2022] Open

Diversity of Pseudomonas aeruginosa Temperate Phages. mSphere 2022;7:e0101521. [PMID: 35196122 PMCID: PMC8865926 DOI: 10.1128/msphere.01015-21] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open

Abstract

Modern sequencing technologies have provided insight into the genetic diversity of numerous species, including the human pathogen Pseudomonas aeruginosa. Bacterial genomes often harbor bacteriophage genomes (prophages), which can account for upwards of 20% of the genome. Prior studies have found P. aeruginosa prophages that contribute to their host’s pathogenicity and fitness. These advantages come in many different forms, including the production of toxins, promotion of biofilm formation, and displacement of other P. aeruginosa strains. While several different genera and species of P. aeruginosa prophages have been studied, there has not been a comprehensive study of the overall diversity of P. aeruginosa-infecting prophages. Here, we present the results of just such an analysis. A total of 6,852 high-confidence prophages were identified from 5,383 P. aeruginosa genomes from strains isolated from the human body and other environments. In total, 3,201 unique prophage sequences were identified. While 53.1% of these prophage sequences displayed sequence similarity to publicly available phage genomes, novel and highly mosaic prophages were discovered. Among these prophages, there is extensive diversity, including diversity within the functionally conserved integrase and C repressor coding regions, two genes responsible for prophage entering and persisting through the lysogenic life cycle. Analysis of integrase, C repressor, and terminase coding regions revealed extensive reassortment among P. aeruginosa prophages. This catalog of P. aeruginosa prophages provides a resource for future studies into the evolution of the species.

IMPORTANCE Prophages play a critical role in the evolution of their host species and can also contribute to the virulence and fitness of pathogenic species. Here, we conducted a comprehensive investigation of prophage sequences from 5,383 publicly available Pseudomonas aeruginosa genomes from human as well as environmental isolates. We identified a diverse population of prophages, including tailed phages, inoviruses, and microviruses; 46.9% of the prophage sequences found share no significant sequence similarity with characterized phages, representing a vast array of novel P. aeruginosa-infecting phages. Our investigation into these prophages found substantial evidence of reassortment. In producing this, the first catalog of P. aeruginosa prophages, we uncovered both novel prophages as well as genetic content that have yet to be explored.

Collapse

Millán Arias P, Alipour F, Hill KA, Kari L. DeLUCS: Deep learning for unsupervised clustering of DNA sequences. PLoS One 2022;17:e0261531. [PMID: 35061715 PMCID: PMC8782307 DOI: 10.1371/journal.pone.0261531] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 12/06/2021] [Indexed: 11/25/2022] Open

Cao M, Peng Q, Wei ZG, Liu F, Hou YF. EdClust: A heuristic sequence clustering method with higher sensitivity. J Bioinform Comput Biol 2021;20:2150036. [PMID: 34939905 DOI: 10.1142/s0219720021500360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Melnyk A, Mohebbi F, Knyazev S, Sahoo B, Hosseini R, Skums P, Zelikovsky A, Patterson M. From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering. J Comput Biol 2021;28:1113-1129. [PMID: 34698508 DOI: 10.1089/cmb.2021.0302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Analysis of SINE Families B2, Dip, and Ves with Special Reference to Polyadenylation Signals and Transcription Terminators. Int J Mol Sci 2021;22:ijms22189897. [PMID: 34576060 PMCID: PMC8466645 DOI: 10.3390/ijms22189897] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 09/05/2021] [Accepted: 09/06/2021] [Indexed: 01/09/2023] Open

Patin NV, Dietrich ZA, Stancil A, Quinan M, Beckler JS, Hall ER, Culter J, Smith CG, Taillefert M, Stewart FJ. Gulf of Mexico blue hole harbors high levels of novel microbial lineages. THE ISME JOURNAL 2021;15:2206-2232. [PMID: 33612832 PMCID: PMC8319197 DOI: 10.1038/s41396-021-00917-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 01/14/2021] [Accepted: 01/27/2021] [Indexed: 01/31/2023]

Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter. Sci Rep 2021;11:13701. [PMID: 34211040 PMCID: PMC8249421 DOI: 10.1038/s41598-021-93154-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 06/07/2021] [Indexed: 02/06/2023] Open

Girgis HZ, James BT, Luczak BB. Identity: rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models. NAR Genom Bioinform 2021;3:lqab001. [PMID: 33554117 PMCID: PMC7850047 DOI: 10.1093/nargab/lqab001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 12/07/2020] [Accepted: 01/08/2021] [Indexed: 11/12/2022] Open

Blokh D, Gitarts J, Stambler I. An information-theoretical analysis of gene nucleotide sequence structuredness for a selection of aging and cancer-related genes. Genomics Inform 2020;18:e41. [PMID: 33412757 PMCID: PMC7808870 DOI: 10.5808/gi.2020.18.4.e41] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 11/27/2020] [Indexed: 12/02/2022] Open

Patin NV, Peña-Gonzalez A, Hatt JK, Moe C, Kirby A, Konstantinidis KT. The Role of the Gut Microbiome in Resisting Norovirus Infection as Revealed by a Human Challenge Study. mBio 2020;11:e02634-20. [PMID: 33203758 PMCID: PMC7683401 DOI: 10.1128/mbio.02634-20] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 10/16/2020] [Indexed: 12/11/2022] Open

Abstract

Norovirus infections take a heavy toll on worldwide public health. While progress has been made toward understanding host responses to infection, the role of the gut microbiome in determining infection outcome is unknown. Moreover, data are lacking on the nature and duration of the microbiome response to norovirus infection, which has important implications for diagnostics and host recovery. Here, we characterized the gut microbiomes of subjects enrolled in a norovirus challenge study. We analyzed microbiome features of asymptomatic and symptomatic individuals at the genome (population) and gene levels and assessed their response over time in symptomatic individuals. We show that the preinfection microbiomes of subjects with asymptomatic infections were enriched in Bacteroidetes and depleted in Clostridia relative to the microbiomes of symptomatic subjects. These compositional differences were accompanied by differences in genes involved in the metabolism of glycans and sphingolipids that may aid in host resilience to infection. We further show that microbiomes shifted in composition following infection and that recovery times were variable among human hosts. In particular, Firmicutes increased immediately following the challenge, while Bacteroidetes and Proteobacteria decreased over the same time. Genes enriched in the microbiomes of symptomatic subjects, including the adenylyltransferase glgC, were linked to glycan metabolism and cell-cell signaling, suggesting as-yet unknown roles for these processes in determining infection outcome. These results provide important context for understanding the gut microbiome role in host susceptibility to symptomatic norovirus infection and long-term health outcomes.IMPORTANCE The role of the human gut microbiome in determining whether an individual infected with norovirus will be symptomatic is poorly understood. This study provides important data on microbes that distinguish asymptomatic from symptomatic microbiomes and links these features to infection responses in a human challenge study. The results have implications for understanding resistance to and treatment of norovirus infections.

Collapse

Paul T, Vainio S, Roning J. Clustering and classification of virus sequence through music communication protocol and wavelet transform. Genomics 2020;113:778-784. [PMID: 33069829 PMCID: PMC7561519 DOI: 10.1016/j.ygeno.2020.10.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Accepted: 10/13/2020] [Indexed: 01/19/2023]

Review of Hepatitis E Virus in Rats: Evident Risk of Species Orthohepevirus C to Human Zoonotic Infection and Disease. Viruses 2020;12:v12101148. [PMID: 33050353 PMCID: PMC7600399 DOI: 10.3390/v12101148] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 09/29/2020] [Accepted: 10/07/2020] [Indexed: 12/13/2022] Open

Abrouk M, Ahmed HI, Cubry P, Šimoníková D, Cauet S, Pailles Y, Bettgenhaeuser J, Gapa L, Scarcelli N, Couderc M, Zekraoui L, Kathiresan N, Čížková J, Hřibová E, Doležel J, Arribat S, Bergès H, Wieringa JJ, Gueye M, Kane NA, Leclerc C, Causse S, Vancoppenolle S, Billot C, Wicker T, Vigouroux Y, Barnaud A, Krattinger SG. Fonio millet genome unlocks African orphan crop diversity for agriculture in a changing climate. Nat Commun 2020;11:4488. [PMID: 32901040 DOI: 10.1101/2020.04.11.037671] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Accepted: 08/16/2020] [Indexed: 05/28/2023] Open

Affiliation(s)

Michael Abrouk Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Hanin Ibrahim Ahmed Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Philippe Cubry DIADE, Univ Montpellier, IRD, Montpellier, France
Denisa Šimoníková Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Hana for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Stéphane Cauet CNRGV Plant Genomics Center, INRAE, Toulouse, France
Yveline Pailles Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Jan Bettgenhaeuser Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Liubov Gapa Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Nora Scarcelli DIADE, Univ Montpellier, IRD, Montpellier, France
Marie Couderc DIADE, Univ Montpellier, IRD, Montpellier, France
Leila Zekraoui DIADE, Univ Montpellier, IRD, Montpellier, France
Nagarajan Kathiresan Supercomputing Core Lab, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Jana Čížková Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Hana for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Eva Hřibová Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Hana for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Jaroslav Doležel Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Hana for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Sandrine Arribat CNRGV Plant Genomics Center, INRAE, Toulouse, France
Hélène Bergès CNRGV Plant Genomics Center, INRAE, Toulouse, France Inari Agriculture, One Kendall Square Building 600/700, Cambridge, MA, 02139, USA
Jan J Wieringa Naturalis Biodiversity Center, Leiden, the Netherlands
Mathieu Gueye Laboratoire de Botanique, Département de Botanique et Géologie, IFAN Ch. A. Diop/UCAD, Dakar, Senegal
Ndjido A Kane Senegalese Agricultural Research Institute, Dakar, Senegal Laboratoire Mixte International LAPSE, Dakar, Senegal
Christian Leclerc CIRAD, UMR AGAP, Montpellier, France AGAP, Université de Montpellier, Cirad, INRAE, Institut Agro, Montpellier, France
Sandrine Causse CIRAD, UMR AGAP, Montpellier, France AGAP, Université de Montpellier, Cirad, INRAE, Institut Agro, Montpellier, France
Sylvie Vancoppenolle CIRAD, UMR AGAP, Montpellier, France AGAP, Université de Montpellier, Cirad, INRAE, Institut Agro, Montpellier, France
Claire Billot CIRAD, UMR AGAP, Montpellier, France AGAP, Université de Montpellier, Cirad, INRAE, Institut Agro, Montpellier, France
Thomas Wicker Department of Plant and Microbial Biology, University of Zurich, Zürich, Switzerland
Yves Vigouroux DIADE, Univ Montpellier, IRD, Montpellier, France
Adeline Barnaud DIADE, Univ Montpellier, IRD, Montpellier, France. Laboratoire Mixte International LAPSE, Dakar, Senegal.
Simon G Krattinger Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Collapse

Abrouk M, Ahmed HI, Cubry P, Šimoníková D, Cauet S, Pailles Y, Bettgenhaeuser J, Gapa L, Scarcelli N, Couderc M, Zekraoui L, Kathiresan N, Čížková J, Hřibová E, Doležel J, Arribat S, Bergès H, Wieringa JJ, Gueye M, Kane NA, Leclerc C, Causse S, Vancoppenolle S, Billot C, Wicker T, Vigouroux Y, Barnaud A, Krattinger SG. Fonio millet genome unlocks African orphan crop diversity for agriculture in a changing climate. Nat Commun 2020;11:4488. [PMID: 32901040 PMCID: PMC7479619 DOI: 10.1038/s41467-020-18329-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Accepted: 08/16/2020] [Indexed: 01/24/2023] Open

Affiliation(s)

Michael Abrouk Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Hanin Ibrahim Ahmed Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Philippe Cubry DIADE, Univ Montpellier, IRD, Montpellier, France
Denisa Šimoníková Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Hana for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Stéphane Cauet CNRGV Plant Genomics Center, INRAE, Toulouse, France
Yveline Pailles Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Jan Bettgenhaeuser Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Liubov Gapa Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Nora Scarcelli DIADE, Univ Montpellier, IRD, Montpellier, France
Marie Couderc DIADE, Univ Montpellier, IRD, Montpellier, France
Leila Zekraoui DIADE, Univ Montpellier, IRD, Montpellier, France
Nagarajan Kathiresan Supercomputing Core Lab, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Jana Čížková Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Hana for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Eva Hřibová Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Hana for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Jaroslav Doležel Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Hana for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Sandrine Arribat CNRGV Plant Genomics Center, INRAE, Toulouse, France
Hélène Bergès CNRGV Plant Genomics Center, INRAE, Toulouse, France Inari Agriculture, One Kendall Square Building 600/700, Cambridge, MA, 02139, USA
Jan J Wieringa Naturalis Biodiversity Center, Leiden, the Netherlands
Mathieu Gueye Laboratoire de Botanique, Département de Botanique et Géologie, IFAN Ch. A. Diop/UCAD, Dakar, Senegal
Ndjido A Kane Senegalese Agricultural Research Institute, Dakar, Senegal Laboratoire Mixte International LAPSE, Dakar, Senegal
Christian Leclerc CIRAD, UMR AGAP, Montpellier, France AGAP, Université de Montpellier, Cirad, INRAE, Institut Agro, Montpellier, France
Sandrine Causse CIRAD, UMR AGAP, Montpellier, France AGAP, Université de Montpellier, Cirad, INRAE, Institut Agro, Montpellier, France
Sylvie Vancoppenolle CIRAD, UMR AGAP, Montpellier, France AGAP, Université de Montpellier, Cirad, INRAE, Institut Agro, Montpellier, France
Claire Billot CIRAD, UMR AGAP, Montpellier, France AGAP, Université de Montpellier, Cirad, INRAE, Institut Agro, Montpellier, France
Thomas Wicker Department of Plant and Microbial Biology, University of Zurich, Zürich, Switzerland
Yves Vigouroux DIADE, Univ Montpellier, IRD, Montpellier, France
Adeline Barnaud DIADE, Univ Montpellier, IRD, Montpellier, France. Laboratoire Mixte International LAPSE, Dakar, Senegal.
Simon G Krattinger Center for Desert Agriculture, Biological and Environmental Science & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Collapse

Jiang P, Luo J, Wang Y, Deng P, Schmidt B, Tang X, Chen N, Wong L, Zhao L. kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers. Bioinformatics 2020;35:4871-4878. [PMID: 31038666 DOI: 10.1093/bioinformatics/btz299] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 04/02/2019] [Accepted: 04/19/2019] [Indexed: 12/25/2022] Open

Borredá C, Pérez-Román E, Ibanez V, Terol J, Talon M. Reprogramming of Retrotransposon Activity during Speciation of the Genus Citrus. Genome Biol Evol 2020;11:3478-3495. [PMID: 31710678 PMCID: PMC7145672 DOI: 10.1093/gbe/evz246] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/04/2019] [Indexed: 12/13/2022] Open

Sahlin K, Medvedev P. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm. J Comput Biol 2020;27:472-484. [PMID: 32181688 DOI: 10.1089/cmb.2019.0299] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Valencia JD, Girgis HZ. LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo. BMC Genomics 2019;20:450. [PMID: 31159720 PMCID: PMC6547461 DOI: 10.1186/s12864-019-5796-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 05/14/2019] [Indexed: 12/19/2022] Open

Abstract

BACKGROUND

Long terminal repeat retrotransposons are the most abundant transposons in plants. They play important roles in alternative splicing, recombination, gene regulation, and defense mechanisms. Large-scale sequencing projects for plant genomes are currently underway. Software tools are important for annotating long terminal repeat retrotransposons in these newly available genomes. However, the available tools are not very sensitive to known elements and perform inconsistently on different genomes. Some are hard to install or obsolete. They may struggle to process large plant genomes. None can be executed in parallel out of the box and very few have features to support visual review of new elements. To overcome these limitations, we developed LtrDetector, which uses techniques inspired by signal-processing.

RESULTS

We compared LtrDetector to LTR_Finder and LTRharvest, the two most successful predecessor tools, on six plant genomes. For each organism, we constructed a ground truth data set based on queries from a consensus sequence database. According to this evaluation, LtrDetector was the most sensitive tool, achieving 16-23% improvement in sensitivity over LTRharvest and 21% improvement over LTR_Finder. All three tools had low false positive rates, with LtrDetector achieving 98.2% precision, in between its two competitors. Overall, LtrDetector provides the best compromise between high sensitivity and low false positive rate while requiring moderate time and utilizing memory available on personal computers.

CONCLUSIONS

LtrDetector uses a novel methodology revolving around k-mer distributions, which allows it to produce high-quality results using relatively lightweight procedures. It is easy to install and use. It is not species specific, performing well using its default parameters on genomes of varying size and repeat content. It is automatically configured for parallel execution and runs efficiently on an ordinary personal computer. It includes a k-mer scores visualization tool to facilitate manual review of the identified elements. These features make LtrDetector an attractive tool for future annotation projects involving long terminal repeat retrotransposons.

Collapse

Tight clustering for large datasets with an application to gene expression data. Sci Rep 2019;9:3053. [PMID: 30816195 PMCID: PMC6395712 DOI: 10.1038/s41598-019-39459-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 01/25/2019] [Indexed: 11/24/2022] Open