Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cao CC, Li C, Huang Z, Ma X, Sun X. Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing. Genet Epidemiol 2013;37:820-30. [PMID: 24166758 DOI: 10.1002/gepi.21769] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 09/09/2013] [Accepted: 09/27/2013] [Indexed: 01/19/2023]

For:	Cao CC, Li C, Huang Z, Ma X, Sun X. Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing. Genet Epidemiol 2013;37:820-30. [PMID: 24166758 DOI: 10.1002/gepi.21769] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 09/09/2013] [Accepted: 09/27/2013] [Indexed: 01/19/2023]

Number

Cited by Other Article(s)

Clouard C, Ausmees K, Nettelblad C. A joint use of pooling and imputation for genotyping SNPs. BMC Bioinformatics 2022;23:421. [PMID: 36229780 PMCID: PMC9563787 DOI: 10.1186/s12859-022-04974-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 09/29/2022] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented.

RESULTS

We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts.

CONCLUSIONS

We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding.

Collapse

seekCRIT: Detecting and characterizing differentially expressed circular RNAs using high-throughput sequencing data. PLoS Comput Biol 2020;16:e1008338. [PMID: 33079938 PMCID: PMC7598922 DOI: 10.1371/journal.pcbi.1008338] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 10/30/2020] [Accepted: 09/13/2020] [Indexed: 11/19/2022] Open

Abstract

Over the past two decades, researchers have discovered a special form of alternative splicing that produces a circular form of RNA. Although these circular RNAs (circRNAs) have garnered considerable attention in the scientific community for their biogenesis and functions, the focus of current studies has been on the tissue-specific circRNAs that exist only in one tissue but not in other tissues or on the disease-specific circRNAs that exist in certain disease conditions, such as cancer, but not under normal conditions. This approach was conducted in the relative absence of methods that analyze a group of common circRNAs that exist in both conditions, but are more abundant in one condition relative to another (differentially expressed). Studies of differentially expressed circRNAs (DECs) between two conditions would serve as a significant first step in filling this void. Here, we introduce a novel computational tool, seekCRIT (seek for differentially expressed CircRNAs In Transcriptome), that identifies the DECs between two conditions from high-throughput sequencing data. Using rat retina RNA-seq data from ischemic and normal conditions, we show that over 74% of identifiable circRNAs are expressed in both conditions and over 40 circRNAs are differentially expressed between two conditions. We also obtain a high qPCR validation rate of 90% for DECs with a FDR of < 5%. Our results demonstrate that seekCRIT is a novel and efficient approach to detect DECs using rRNA depleted RNA-seq data. seekCRIT is freely downloadable at https://github.com/UofLBioinformatics/seekCRIT. The source code is licensed under the MIT License. seekCRIT is developed and tested on Linux CentOS-7.

The focus of circRNA studies has been on condition-specific circRNAs, however, there are situations in which circRNAs exist in both conditions with different abundance. Here, we introduce a new and robust analytic software, seekCRIT (seek for differentially expressed CircRNAs In Transcriptome), that identifies the differentially expressed circRNAs (DECs) between two conditions from high-throughput sequencing data. seekCRIT provides a straightforward normalized quantification of circRNAs and statistical measures by adapting a junction-count-based estimation approach. Using publicly available ribosomal RNA depleted RNA-seq data and our own rat retina RNA-seq data, we show that seekCRIT can efficiently detect circRNAs and identify DECs. We also obtain a high qPCR validation rate of 90% for DECs with a FDR of < 5%. Our results demonstrate that seekCRIT is a novel and efficient software to detect DECs using rRNA depleted RNA-seq data.

Collapse

Zhernakov AI, Afonin AM, Gavriliuk ND, Moiseeva OM, Zhukov VA. s-dePooler: determination of polymorphism carriers from overlapping DNA pools. BMC Bioinformatics 2019;20:45. [PMID: 30669964 PMCID: PMC6343301 DOI: 10.1186/s12859-019-2616-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 01/09/2019] [Indexed: 11/26/2022] Open

Zhang Q, Guldbrandtsen B, Calus MPL, Lund MS, Sahana G. Comparison of gene-based rare variant association mapping methods for quantitative traits in a bovine population with complex familial relationships. Genet Sel Evol 2016;48:60. [PMID: 27534618 PMCID: PMC4989328 DOI: 10.1186/s12711-016-0238-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 08/04/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

There is growing interest in the role of rare variants in the variation of complex traits due to increasing evidence that rare variants are associated with quantitative traits. However, association methods that are commonly used for mapping common variants are not effective to map rare variants. Besides, livestock populations have large half-sib families and the occurrence of rare variants may be confounded with family structure, which makes it difficult to disentangle their effects from family mean effects. We compared the power of methods that are commonly applied in human genetics to map rare variants in cattle using whole-genome sequence data and simulated phenotypes. We also studied the power of mapping rare variants using linear mixed models (LMM), which are the method of choice to account for both family relationships and population structure in cattle.

RESULTS

We observed that the power of the LMM approach was low for mapping a rare variant (defined as those that have frequencies lower than 0.01) with a moderate effect (5 to 8 % of phenotypic variance explained by multiple rare variants that vary from 5 to 21 in number) contributing to a QTL with a sample size of 1000. In contrast, across the scenarios studied, statistical methods that are specialized for mapping rare variants increased power regardless of whether multiple rare variants or a single rare variant underlie a QTL. Different methods for combining rare variants in the test single nucleotide polymorphism set resulted in similar power irrespective of the proportion of total genetic variance explained by the QTL. However, when the QTL variance is very small (only 0.1 % of the total genetic variance), these specialized methods for mapping rare variants and LMM generally had no power to map the variants within a gene with sample sizes of 1000 or 5000.

CONCLUSIONS

We observed that the methods that combine multiple rare variants within a gene into a meta-variant generally had greater power to map rare variants compared to LMM. Therefore, it is recommended to use rare variant association mapping methods to map rare genetic variants that affect quantitative traits in livestock, such as bovine populations.

Collapse

Li C, Cao C, Tu J, Sun X. An accurate clone-based haplotyping method by overlapping pool sequencing. Nucleic Acids Res 2016;44:e112. [PMID: 27095193 PMCID: PMC4937318 DOI: 10.1093/nar/gkw284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 04/07/2016] [Indexed: 11/25/2022] Open

Combinatorial pooled sequencing: experiment design and decoding. QUANTITATIVE BIOLOGY 2016. [DOI: 10.1007/s40484-016-0064-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Kim K, Seong MW, Chung WH, Park SS, Leem S, Park W, Kim J, Lee K, Park RW, Kim N. Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants. Genomics Inform 2015;13:31-9. [PMID: 26175660 PMCID: PMC4500796 DOI: 10.5808/gi.2015.13.2.31] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 05/26/2015] [Accepted: 05/28/2015] [Indexed: 02/06/2023] Open

Affiliation(s)

Kyung Kim Department of Biomedical Informatics, Ajou University School of Medicine, Suwon 443-749, Korea. ; Department of Biomedical Science, Graduate School, Ajou University, Suwon 443-749, Korea. ; Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Korea
Moon-Woo Seong Department of Laboratory Medicine, Seoul National University Hospital College of Medicine, Seoul 110-799, Korea
Won-Hyong Chung Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Korea
Sung Sup Park Department of Laboratory Medicine, Seoul National University Hospital College of Medicine, Seoul 110-799, Korea
Sangseob Leem Department of Biomedical Informatics, Ajou University School of Medicine, Suwon 443-749, Korea
Won Park Department of Functional Genomics, Korea University of Science and Technology, Daejeon 305-806, Korea. ; Epigenomics Research Center, Genome Institute, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Korea
Jihyun Kim Department of Biomedical Informatics, Ajou University School of Medicine, Suwon 443-749, Korea. ; Department of Biomedical Science, Graduate School, Ajou University, Suwon 443-749, Korea
KiYoung Lee Department of Biomedical Informatics, Ajou University School of Medicine, Suwon 443-749, Korea. ; Department of Biomedical Science, Graduate School, Ajou University, Suwon 443-749, Korea
Rae Woong Park Department of Biomedical Informatics, Ajou University School of Medicine, Suwon 443-749, Korea. ; Department of Biomedical Science, Graduate School, Ajou University, Suwon 443-749, Korea
Namshin Kim Department of Functional Genomics, Korea University of Science and Technology, Daejeon 305-806, Korea. ; Epigenomics Research Center, Genome Institute, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Korea

Collapse

Cao CC, Sun X. Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing. Bioinformatics 2014;31:515-22. [PMID: 25304780 DOI: 10.1093/bioinformatics/btu670] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Cao CC, Li C, Sun X. Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers. BMC Bioinformatics 2014;15:195. [PMID: 24934981 PMCID: PMC4229885 DOI: 10.1186/1471-2105-15-195] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 06/10/2014] [Indexed: 11/23/2022] Open

Abstract

Background

Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost.

Results

Based on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5–97.9% variants with the variant frequency ranging from 0.5 to 1.5%.

Conclusions

Using the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing the required data throughput and cost.

Collapse