Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Woloszynek S, Zhao Z, Chen J, Rosen GL. 16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses. PLoS Comput Biol 2019;15:e1006721. [PMID: 30807567 PMCID: PMC6407789 DOI: 10.1371/journal.pcbi.1006721] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 03/08/2019] [Accepted: 12/17/2018] [Indexed: 12/26/2022] Open

For:	Woloszynek S, Zhao Z, Chen J, Rosen GL. 16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses. PLoS Comput Biol 2019;15:e1006721. [PMID: 30807567 PMCID: PMC6407789 DOI: 10.1371/journal.pcbi.1006721] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 03/08/2019] [Accepted: 12/17/2018] [Indexed: 12/26/2022] Open

Number

Cited by Other Article(s)

Meier TA, Refahi MS, Hearne G, Restifo DS, Munoz-Acuna R, Rosen GL, Woloszynek S. The Role and Applications of Artificial Intelligence in the Treatment of Chronic Pain. Curr Pain Headache Rep 2024;28:769-784. [PMID: 38822995 DOI: 10.1007/s11916-024-01264-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/28/2024] [Indexed: 06/03/2024]

Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024;10. [PMID: 38630611 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open

Wang C, Zou Q. Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE. BMC Biol 2023;21:12. [PMID: 36694239 PMCID: PMC9875434 DOI: 10.1186/s12915-023-01510-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 01/05/2023] [Indexed: 01/25/2023] Open

Zhai H, Fukuyama J. A convenient correspondence between k-mer-based metagenomic distances and phylogenetically-informed β-diversity measures. PLoS Comput Biol 2023;19:e1010821. [PMID: 36608056 PMCID: PMC9879504 DOI: 10.1371/journal.pcbi.1010821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 01/26/2023] [Accepted: 12/16/2022] [Indexed: 01/07/2023] Open

Sokhansanj BA, Zhao Z, Rosen GL. Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity. BIOLOGY 2022;11:1786. [PMID: 36552295 PMCID: PMC9774807 DOI: 10.3390/biology11121786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 11/28/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022]

David MM, Tataru C, Pope Q, Baker LJ, English MK, Epstein HE, Hammer A, Kent M, Sieler MJ, Mueller RS, Sharpton TJ, Tomas F, Vega Thurber R, Fern XZ. Revealing General Patterns of Microbiomes That Transcend Systems: Potential and Challenges of Deep Transfer Learning. mSystems 2022;7:e0105821. [PMID: 35040699 PMCID: PMC8765061 DOI: 10.1128/msystems.01058-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Wang C, Ju Y, Zou Q, Lin C. DeepAc4C: a convolutional neural network model with hybrid features composed of physicochemical patterns and distributed representation information for identification of N4-acetylcytidine in mRNA. Bioinformatics 2021;38:52-57. [PMID: 34427581 DOI: 10.1093/bioinformatics/btab611] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 08/17/2021] [Accepted: 08/20/2021] [Indexed: 02/03/2023] Open

Deng Z, Zhang J, Li J, Zhang X. Application of Deep Learning in Plant-Microbiota Association Analysis. Front Genet 2021;12:697090. [PMID: 34691142 PMCID: PMC8531731 DOI: 10.3389/fgene.2021.697090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 08/31/2021] [Indexed: 01/04/2023] Open

Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021;19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open

Zhao Z, Woloszynek S, Agbavor F, Mell JC, Sokhansanj BA, Rosen GL. Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network. PLoS Comput Biol 2021;17:e1009345. [PMID: 34550967 PMCID: PMC8496832 DOI: 10.1371/journal.pcbi.1009345] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/07/2021] [Accepted: 08/12/2021] [Indexed: 01/04/2023] Open

Abstract

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).

Collapse

Gupta S, Aga D, Pruden A, Zhang L, Vikesland P. Data Analytics for Environmental Science and Engineering Research. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021;55:10895-10907. [PMID: 34338518 DOI: 10.1021/acs.est.1c01026] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Gharavi E, Gu A, Zheng G, Smith JP, Cho HJ, Zhang A, Brown DE, Sheffield NC. Embeddings of genomic region sets capture rich biological associations in lower dimensions. Bioinformatics 2021;37:4299-4306. [PMID: 34156475 PMCID: PMC8652032 DOI: 10.1093/bioinformatics/btab439] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 06/07/2021] [Accepted: 06/15/2021] [Indexed: 11/12/2022] Open

Iuchi H, Matsutani T, Yamada K, Iwano N, Sumi S, Hosoda S, Zhao S, Fukunaga T, Hamada M. Representation learning applications in biological sequence analysis. Comput Struct Biotechnol J 2021;19:3198-3208. [PMID: 34141139 PMCID: PMC8190442 DOI: 10.1016/j.csbj.2021.05.039] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/10/2021] [Accepted: 05/20/2021] [Indexed: 12/16/2022] Open

Affiliation(s)

Hitoshi Iuchi Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
Taro Matsutani Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
Keisuke Yamada School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
Natsuki Iwano Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
Shunsuke Sumi Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan Department of Life Science Frontiers, Center for iPS Cell Research and Application, Kyoto University, Kyoto 606-8507, Japan
Shion Hosoda Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
Shitao Zhao Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
Tsukasa Fukunaga Waseda Institute for Advanced Study, Waseda University, Tokyo 169-0051, Japan Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-0032, Japan
Michiaki Hamada Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan

Collapse

Wei ZG, Zhang XD, Cao M, Liu F, Qian Y, Zhang SW. Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences. Front Microbiol 2021;12:644012. [PMID: 33841367 PMCID: PMC8024490 DOI: 10.3389/fmicb.2021.644012] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 02/17/2021] [Indexed: 12/31/2022] Open

Wang C, Zhang Y, Han S. Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BIOMED RESEARCH INTERNATIONAL 2020;2020:2468789. [PMID: 32566672 PMCID: PMC7275950 DOI: 10.1155/2020/2468789] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 03/20/2020] [Accepted: 03/25/2020] [Indexed: 12/19/2022]

Tataru CA, David MM. Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease. PLoS Comput Biol 2020;16:e1007859. [PMID: 32365061 PMCID: PMC7244183 DOI: 10.1371/journal.pcbi.1007859] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 05/22/2020] [Accepted: 04/08/2020] [Indexed: 12/16/2022] Open

Abstract

Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the number of samples ten to one hundred-fold. This discrepancy decreases the power of studies to identify meaningful differences between samples, increases the likelihood of false positive results, and subsequently limits reproducibility. Despite the vast collections of microbiome data already available, biome-specific patterns of microbial structure are not currently leveraged to inform studies. Here, we derive microbiome-level properties by applying an embedding algorithm to quantify taxon co-occurrence patterns in over 18,000 samples from the American Gut Project (AGP) microbiome crowdsourcing effort. We then compare the predictive power of models trained using properties, normalized taxonomic count data, and another commonly used dimensionality reduction method, Principal Component Analysis in categorizing samples from individuals with inflammatory bowel disease (IBD) and healthy controls. We show that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results. Furthermore, we find that properties correlate significantly with known metabolic pathways. Using these properties, we are able to extract known and new bacterial metabolic pathways associated with inflammatory bowel disease across two completely independent studies. By providing a set of pre-trained embeddings, we allow any V4 16S amplicon study to apply the publicly informed properties to increase the statistical power, reproducibility, and generalizability of analysis.

Collapse