Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

4
(from Reference Citation Analysis)

Article PDFs (2)

Cited by > 0 (3)

Searched Name

Population structure analysis

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Solovieva E, Sakai H. PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data. BMC Bioinformatics 2023;24:135. [PMID: 37020193 PMCID: PMC10074814 DOI: 10.1186/s12859-023-05169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 02/02/2023] [Indexed: 04/07/2023] Open

Abstract

BACKGROUND

Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application.

RESULTS

We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data.

CONCLUSIONS

The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP .

Collapse

Alhusain L, Hafez AM. Nonparametric approaches for population structure analysis. Hum Genomics 2018;12:25. [PMID: 29743099 PMCID: PMC5944014 DOI: 10.1186/s40246-018-0156-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 04/24/2018] [Indexed: 12/28/2022] Open

Alhusain L, Hafez AM. Cluster ensemble based on Random Forests for genetic data. BioData Min 2017;10:37. [PMID: 29270227 PMCID: PMC5732374 DOI: 10.1186/s13040-017-0156-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Accepted: 11/21/2017] [Indexed: 11/25/2022] Open

Abstract

Background

Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable.

Results

Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance.

Conclusions

This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

Collapse

Du Y, Zhou H, Wang F, Liang S, Cheng L, Du X, Pang F, Tian J, Zhao J, Kan B, Xu J, Li J, Zhang F. Multilocus sequence typing-based analysis of Moraxella catarrhalis population structure reveals clonal spreading of drug-resistant strains isolated from childhood pneumonia. Infect Genet Evol 2017;56:117-124. [PMID: 29155241 DOI: 10.1016/j.meegid.2017.11.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 10/05/2017] [Accepted: 11/15/2017] [Indexed: 10/18/2022]

Affiliation(s)

Yinju Du Center for Disease Control and Prevention of Liaocheng, Liaocheng, PR China
Haijian Zhou State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, PR China; Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, PR China
Fei Wang Center for Disease Control and Prevention of Liaocheng, Liaocheng, PR China
Shengnan Liang Center for Disease Control and Prevention of Liaocheng, Liaocheng, PR China
Lihong Cheng Center for Disease Control and Prevention of Liaocheng, Liaocheng, PR China
Xiaofei Du State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, PR China
Feng Pang The People's Hospital of Liaocheng, Liaocheng, PR China
Jinjing Tian The Second People's Hospital of Liaocheng, Liaocheng, PR China
Jinxing Zhao Center for Disease Control and Prevention of Liaocheng, Liaocheng, PR China
Biao Kan State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, PR China; Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, PR China
Jianguo Xu State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, PR China; Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, PR China
Juan Li State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, PR China; Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Hangzhou, PR China.
Furong Zhang Center for Disease Control and Prevention of Liaocheng, Liaocheng, PR China.

Collapse