Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gopalan P, Hao W, Blei DM, Storey JD. Scaling probabilistic models of genetic variation to millions of humans. Nat Genet 2016;48:1587-90. [PMID: 27819665 DOI: 10.1038/ng.3710] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 10/04/2016] [Indexed: 12/20/2022]

For:	Gopalan P, Hao W, Blei DM, Storey JD. Scaling probabilistic models of genetic variation to millions of humans. Nat Genet 2016;48:1587-90. [PMID: 27819665 DOI: 10.1038/ng.3710] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 10/04/2016] [Indexed: 12/20/2022]

Number

Cited by Other Article(s)

Subedi S, Sumida TS, Park YP. A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection. Life Sci Alliance 2024;7:e202402713. [PMID: 39107066 PMCID: PMC11303850 DOI: 10.26508/lsa.202402713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 08/09/2024] Open

Huang D, Niu S, Bai D, Zhao Z, Li C, Deng X, Wang Y. Analysis of population structure and genetic diversity of Camellia tachangensis in Guizhou based on SNP markers. Mol Biol Rep 2024;51:715. [PMID: 38824248 PMCID: PMC11144125 DOI: 10.1007/s11033-024-09632-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 05/10/2024] [Indexed: 06/03/2024]

Mantes AD, Montserrat DM, Bustamante CD, Giró-i-Nieto X, Ioannidis AG. Neural ADMIXTURE for rapid genomic clustering. NATURE COMPUTATIONAL SCIENCE 2023;3:621-629. [PMID: 37600116 PMCID: PMC10438426 DOI: 10.1038/s43588-023-00482-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 06/06/2023] [Indexed: 08/22/2023]

Ko S, Chu BB, Peterson D, Okenwa C, Papp JC, Alexander DH, Sobel EM, Zhou H, Lange KL. Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets. Am J Hum Genet 2023;110:314-325. [PMID: 36610401 PMCID: PMC9943729 DOI: 10.1016/j.ajhg.2022.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/12/2022] [Indexed: 01/09/2023] Open

Dang T, Kumaishi K, Usui E, Kobori S, Sato T, Toda Y, Yamasaki Y, Tsujimoto H, Ichihashi Y, Iwata H. Stochastic variational variable selection for high-dimensional microbiome data. MICROBIOME 2022;10:236. [PMID: 36566203 PMCID: PMC9789572 DOI: 10.1186/s40168-022-01439-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 11/28/2022] [Indexed: 06/17/2023]

Abstract

BACKGROUND

The rapid and accurate identification of a minimal-size core set of representative microbial species plays an important role in the clustering of microbial community data and interpretation of clustering results. However, the huge dimensionality of microbial metagenomics datasets is a major challenge for the existing methods such as Dirichlet multinomial mixture (DMM) models. In the approach of the existing methods, the computational burden of identifying a small number of representative species from a large number of observed species remains a challenge.

RESULTS

We propose a novel approach to improve the performance of the widely used DMM approach by combining three ideas: (i) we propose an indicator variable to identify representative operational taxonomic units that substantially contribute to the differentiation among clusters; (ii) to address the computational burden of high-dimensional microbiome data, we propose a stochastic variational inference, which approximates the posterior distribution using a controllable distribution called variational distribution, and stochastic optimization algorithms for fast computation; and (iii) we extend the finite DMM model to an infinite case by considering Dirichlet process mixtures and estimating the number of clusters as a variational parameter. Using the proposed method, stochastic variational variable selection (SVVS), we analyzed the root microbiome data collected in our soybean field experiment, the human gut microbiome data from three published datasets of large-scale case-control studies and the healthy human microbiome data from the Human Microbiome Project.

CONCLUSIONS

SVVS demonstrates a better performance and significantly faster computation than those of the existing methods in all cases of testing datasets. In particular, SVVS is the only method that can analyze massive high-dimensional microbial data with more than 50,000 microbial species and 1000 samples. Furthermore, a core set of representative microbial species is identified using SVVS that can improve the interpretability of Bayesian mixture models for a wide range of microbiome studies. Video Abstract.

Collapse

Jha J, Hashemi M, Vattikonda AN, Wang H, Jirsa V. Fully Bayesian estimation of virtual brain parameters with self-tuning Hamiltonian Monte Carlo. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac9037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open

Gewirtz AD, Townes FW, Engelhardt BE. Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues. Life Sci Alliance 2022;5:e202101297. [PMID: 35977827 PMCID: PMC9387650 DOI: 10.26508/lsa.202101297] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 07/15/2022] [Accepted: 07/18/2022] [Indexed: 11/24/2022] Open

Abstract

Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene-variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multimodal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA sequencing samples to correspond to a single individual's genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across 10 tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities and identify associations within and across tissue types. We identify 4,645 cis-eQTLs and 995 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. Our code is freely available at https://github.com/gewirtz/TBLDA.

Collapse

Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs. Heredity (Edinb) 2022;129:79-92. [PMID: 35508539 PMCID: PMC9338324 DOI: 10.1038/s41437-022-00535-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 04/04/2022] [Accepted: 04/05/2022] [Indexed: 11/08/2022] Open

Chiu AM, Molloy EK, Tan Z, Talwalkar A, Sankararaman S. Inferring population structure in biobank-scale genomic data. Am J Hum Genet 2022;109:727-737. [PMID: 35298920 PMCID: PMC9069078 DOI: 10.1016/j.ajhg.2022.02.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 02/21/2022] [Indexed: 01/07/2023] Open

Liu J, Xie H, Lin T, Tie C, Luo H, Yang B, Xiong D. Putative variants, genetic diversity and population structure among Soybean cultivars bred at different ages in Huang-Huai-Hai region. Sci Rep 2022;12:2372. [PMID: 35149770 PMCID: PMC8837640 DOI: 10.1038/s41598-022-06447-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 01/24/2022] [Indexed: 11/25/2022] Open

Abstract

Soybean cultivars bred in the Huang-Huai-Hai region (HR) are rich in pedigree information. To date, few reports have exposed the genetic variants, population structure and genetic diversity of cultivars in this region by making use of genome-wide resequencing data. To depict genetic variation, population structure and composition characteristics of genetic diversity, a sample of soybean population composed all by cultivars was constructed. We re-sequenced 181 soybean cultivar genomes with an average depth of 10.38×. In total, 11,185,589 single nucleotide polymorphisms (SNPs) and 2,520,208 insertion-deletions (InDels) were identified on all 20 chromosomes. A considerable number of putative variants existed in important genome regions that may have an incalculable influence on genes, which participated in momentous biological processes. All 181 varieties were divided into five subpopulations according to their breeding years, SA (1963-1980), SB (1983-1988), SC (1991-2000), SD (2001-2011), SE (2012-2017). PCA and population structure figured out that there was no obvious grouping trend. The LD semi-decay distances of sub-population D and E were 182 kb, and 227 kb, respectively. Sub-population A (SA) had the highest value of nucleotide polymorphism (π). With the passage of time, the nucleotide polymorphism of SB and SC decreased gradually, however that of SD and SE, opposite to SB and SC, gave a rapid up-climbing trend, which meant a sharp increase in genetic diversity during the latest 20 years, hinting that breeders may have different breeding goals in different breeding periods in HR. Analysis of the PIC statistics exhibited very similar results with π. The current study is to analyze the genetic variants and characterize the structure and genetic diversity of soybean cultivars bred in different decades in HR, and to provide a theoretical reference for other identical studies.

Collapse

Zhou R, Yang S, Zhang B, Qi Z, Xin D, Su A, Li S, Cheng P, Bai Y, Yin Z, Zhang B, Zhao Y, Zhao Y, Chen Q, Wu X. Analysis of the genetic diversity of grain legume germplasm resources in China and the development of universal SSR primers. BIOTECHNOL BIOTEC EQ 2022. [DOI: 10.1080/13102818.2021.2006784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open

Affiliation(s)

Runnan Zhou Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Siqi Yang Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Bo Zhang Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Zhaoming Qi Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Dawei Xin Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Anyu Su Department of Land Remediation Engineering, College of Public Administration and Law, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Sinan Li Key Lab of Maize Genetics and Breeding, Department of National Corn Engineering Laboratory, Heilongjiang Academy of Agricultural Sciences, Harbin, Heilongjiang, PR China
Peng Cheng Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Yunqi Bai Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Zhengong Yin Crop Resources Institute of Heilongjiang Academy of Agricultural Sciences, Harbin, Heilongjiang, PR China
Binshuo Zhang Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Yujing Zhao Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Ying Zhao Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Qingshan Chen Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China
Xiaoxia Wu Department of Agronomy, College of Agriculture, Northeast Agricultural University, Harbin, Heilongjiang, PR China

Collapse

Carress H, Lawson DJ, Elhaik E. Population genetic considerations for using biobanks as international resources in the pandemic era and beyond. BMC Genomics 2021;22:351. [PMID: 34001009 PMCID: PMC8127217 DOI: 10.1186/s12864-021-07618-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/14/2021] [Indexed: 12/11/2022] Open

Shastry V, Adams PE, Lindtke D, Mandeville EG, Parchman TL, Gompert Z, Buerkle CA. Model-based genotype and ancestry estimation for potential hybrids with mixed-ploidy. Mol Ecol Resour 2021;21:1434-1451. [PMID: 33482035 DOI: 10.1111/1755-0998.13330] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 12/11/2020] [Accepted: 01/11/2021] [Indexed: 11/29/2022]

Bose A, Kalantzis V, Kontopoulou EM, Elkady M, Paschou P, Drineas P. TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes. Bioinformatics 2020;35:3679-3683. [PMID: 30957838 DOI: 10.1093/bioinformatics/btz157] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 02/26/2019] [Accepted: 04/04/2019] [Indexed: 11/12/2022] Open

Hashemi M, Vattikonda AN, Sip V, Guye M, Bartolomei F, Woodman MM, Jirsa VK. The Bayesian Virtual Epileptic Patient: A probabilistic framework designed to infer the spatial map of epileptogenicity in a personalized large-scale brain model of epilepsy spread. Neuroimage 2020;217:116839. [PMID: 32387625 DOI: 10.1016/j.neuroimage.2020.116839] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 04/02/2020] [Accepted: 04/07/2020] [Indexed: 12/28/2022] Open

Greenbaum G, Rubin A, Templeton AR, Rosenberg NA. Network-based hierarchical population structure analysis for large genomic data sets. Genome Res 2019;29:2020-2033. [PMID: 31694865 PMCID: PMC6886512 DOI: 10.1101/gr.250092.119] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 11/01/2019] [Indexed: 01/24/2023]

Hao W, Storey JD. Extending Tests of Hardy-Weinberg Equilibrium to Structured Populations. Genetics 2019;213:759-770. [PMID: 31537622 PMCID: PMC6827367 DOI: 10.1534/genetics.119.302370] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 08/21/2019] [Indexed: 12/22/2022] Open

Joseph TA, Pe'er I. Inference of Population Structure from Time-Series Genotype Data. Am J Hum Genet 2019;105:317-333. [PMID: 31256878 PMCID: PMC6698887 DOI: 10.1016/j.ajhg.2019.06.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 06/04/2019] [Indexed: 10/26/2022] Open

Cabreros I, Storey JD. A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis. Genetics 2019;212:1009-1029. [PMID: 31028112 PMCID: PMC6707457 DOI: 10.1534/genetics.119.302159] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 04/08/2019] [Indexed: 11/18/2022] Open

Dang T, Kishino H. Stochastic Variational Inference for Bayesian Phylogenetics: A Case of CAT Model. Mol Biol Evol 2019;36:825-833. [PMID: 30715448 PMCID: PMC6445300 DOI: 10.1093/molbev/msz020] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Pan X, Wang Y, Wong EHM, Telenti A, Venter JC, Jin L. Fine population structure analysis method for genomes of many. Sci Rep 2017;7:12608. [PMID: 28974706 PMCID: PMC5626719 DOI: 10.1038/s41598-017-12319-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 09/01/2017] [Indexed: 12/22/2022] Open

Novembre J, Peter BM. Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 2016;41:98-105. [PMID: 27662060 DOI: 10.1016/j.gde.2016.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 01/17/2023]