Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bansal V, Harismendy O, Tewhey R, Murray SS, Schork NJ, Topol EJ, Frazer KA. Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res 2010;20:537-45. [PMID: 20150320 DOI: 10.1101/gr.100040.109] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

For:	Bansal V, Harismendy O, Tewhey R, Murray SS, Schork NJ, Topol EJ, Frazer KA. Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res 2010;20:537-45. [PMID: 20150320 DOI: 10.1101/gr.100040.109] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Number

Cited by Other Article(s)

Selvakumar R, Jat GS, Manjunathagowda DC. Allele mining through TILLING and EcoTILLING approaches in vegetable crops. PLANTA 2023;258:15. [PMID: 37311932 DOI: 10.1007/s00425-023-04176-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/01/2023] [Indexed: 06/15/2023]

Abstract

MAIN CONCLUSION

The present review illustrates a comprehensive overview of the allele mining for genetic improvement in vegetable crops, and allele exploration methods and their utilization in various applications related to pre-breeding of economically important traits in vegetable crops. Vegetable crops have numerous wild descendants, ancestors and terrestrial races that could be exploited to develop high-yielding and climate-resilient varieties resistant/tolerant to biotic and abiotic stresses. To further boost the genetic potential of economic traits, the available genomic tools must be targeted and re-opened for exploitation of novel alleles from genetic stocks by the discovery of beneficial alleles from wild relatives and their introgression to cultivated types. This capability would be useful for giving plant breeders direct access to critical alleles that confer higher production, improve bioactive compounds, increase water and nutrient productivity as well as biotic and abiotic stress resilience. Allele mining is a new sophisticated technique for dissecting naturally occurring allelic variants in candidate genes that influence important traits which could be used for genetic improvement of vegetable crops. Target-induced local lesions in genomes (TILLINGs) is a sensitive mutation detection avenue in functional genomics, particularly wherein genome sequence information is limited or not available. Population exposure to chemical mutagens and the absence of selectivity lead to TILLING and EcoTILLING. EcoTILLING may lead to natural induction of SNPs and InDels. It is anticipated that as TILLING is used for vegetable crops improvement in the near future, indirect benefits will become apparent. Therefore, in this review we have highlighted the up-to-date information on allele mining for genetic enhancement in vegetable crops and methods of allele exploration and their use in pre-breeding for improvement of economic traits.

Collapse

Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data. STATS 2023. [DOI: 10.3390/stats6010029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2023] Open

Muppidi P, Wright E, Wassmer SC, Gupta H. Diagnosis of cerebral malaria: Tools to reduce Plasmodium falciparum associated mortality. Front Cell Infect Microbiol 2023;13:1090013. [PMID: 36844403 PMCID: PMC9947298 DOI: 10.3389/fcimb.2023.1090013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 01/24/2023] [Indexed: 02/11/2023] Open

Chen L, Yang W, Li D, Ma Y, Chen L, You S, Liu S. Poly cytosine (C)/poly adenine (A) modified probe for signal "on-off-on" assay of single-base mismatched dsDNA by a competitive mechanism. Anal Chim Acta 2023;1239:340705. [PMID: 36628713 DOI: 10.1016/j.aca.2022.340705] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 12/01/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022]

Abstract

Direct discrimination of single-base mismatched dsDNA by a simple method or strategy would provide enormous opportunities for applications in the fields of life sciences and disease diagnosis. Herein, the peroxidase-mimicking activity of a metal-organic framework nanoprobe (MOF) was well exploited for the direct discrimination of single-base mismatched dsDNA based on a competition-induced signal on-off-on mechanism. The single-base mismatched dsDNA related with FecB gene (usually guanine (G)/thymine (T) mismatch) and MIL-88B-NH₂ were used as target and MOF model, respectively. Firstly, polyA/polyC were loosely adsorbed onto the MOFs via the weak interaction to block the peroxidase activity of MOF, inducing the signal transition from on to off. Unexpectedly, the single-base mismatched (GT) dsDNA could reverse the signal response of MOF probe from off to on. But it could not occur for other nonspecific mismatches, such as CT and TT-mismatched dsDNA. A synergistic interaction mechanism between multiple GT mismatches and polyA/polyC was attempted to explain the competitive dissociation of polyA/polyC from MOF for the recovery of peroxidase activity. With it, a wide linear detection ranges from 10^-9 M-10^-5 M of GT mismatched dsDNA and a low detection limit of 0.247 nM could be achieved, even in the real samples. The effect of mismatched base number or position was also studied. Such a simple, rapid, cost-effective, and one-step mixing and checking method for single-base mismatched dsDNA discrimination eliminates the complex sample pretreatment, special DNA probe design, exclusive amplification or signal readout means. It thus offers a simple and effective route for direct discrimination of mismatched dsDNA and might hold a huge potential for the applications in gene analysis, disease diagnosis, and elementary research in life sciences.

Collapse

Affiliation(s)

Lihua Chen Key Laboratory of Optic-electric Sensing and Analytical Chemistry for Life Science, Shandong Key Laboratory of Biochemical Analysis, Key Laboratory of Analytical Chemistry for Life Science in Universities of Shandong, Key Laboratory of Ecochemical Engineering, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao, 266042, PR China.
Wenjie Yang Key Laboratory of Optic-electric Sensing and Analytical Chemistry for Life Science, Shandong Key Laboratory of Biochemical Analysis, Key Laboratory of Analytical Chemistry for Life Science in Universities of Shandong, Key Laboratory of Ecochemical Engineering, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao, 266042, PR China
Dong Li Key Laboratory of Optic-electric Sensing and Analytical Chemistry for Life Science, Shandong Key Laboratory of Biochemical Analysis, Key Laboratory of Analytical Chemistry for Life Science in Universities of Shandong, Key Laboratory of Ecochemical Engineering, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao, 266042, PR China
Yunkang Ma Key Laboratory of Optic-electric Sensing and Analytical Chemistry for Life Science, Shandong Key Laboratory of Biochemical Analysis, Key Laboratory of Analytical Chemistry for Life Science in Universities of Shandong, Key Laboratory of Ecochemical Engineering, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao, 266042, PR China
Lili Chen Key Laboratory of Optic-electric Sensing and Analytical Chemistry for Life Science, Shandong Key Laboratory of Biochemical Analysis, Key Laboratory of Analytical Chemistry for Life Science in Universities of Shandong, Key Laboratory of Ecochemical Engineering, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao, 266042, PR China
Shuang You Key Laboratory of Optic-electric Sensing and Analytical Chemistry for Life Science, Shandong Key Laboratory of Biochemical Analysis, Key Laboratory of Analytical Chemistry for Life Science in Universities of Shandong, Key Laboratory of Ecochemical Engineering, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao, 266042, PR China
Shufeng Liu College of Chemistry and Chemical Engineering, Yantai University, Yantai, 264005, PR China.

Collapse

Manjula G, Pranavchand R, Kumuda I, Reddy BS, Reddy BM. The SNP rs7865618 of 9p21.3 locus emerges as the most promising marker of coronary artery disease in the southern Indian population. Sci Rep 2020;10:21511. [PMID: 33298998 PMCID: PMC7726101 DOI: 10.1038/s41598-020-77080-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 11/05/2020] [Indexed: 11/09/2022] Open

Guo J, Khan J, Pradhan S, Shahi D, Khan N, Avci M, Mcbreen J, Harrison S, Brown-Guedira G, Murphy JP, Johnson J, Mergoum M, Esten Mason R, Ibrahim AMH, Sutton R, Griffey C, Babar MA. Multi-Trait Genomic Prediction of Yield-Related Traits in US Soft Wheat under Variable Water Regimes. Genes (Basel) 2020;11:genes11111270. [PMID: 33126620 PMCID: PMC7716228 DOI: 10.3390/genes11111270] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/23/2020] [Accepted: 10/26/2020] [Indexed: 11/16/2022] Open

Affiliation(s)

Jia Guo Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
Jahangir Khan Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
Sumit Pradhan Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
Dipendra Shahi Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
Naeem Khan Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
Muhsin Avci Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
Jordan Mcbreen Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
Stephen Harrison School of Plant Environment and Soil Sciences, Louisiana State University, Baton Rouge, LA 70803, USA;
Gina Brown-Guedira USDA-ARS, North Carolina State University, Raleigh, NC 27607, USA;
Joseph Paul Murphy Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27607, USA;
Jerry Johnson Department of Crop and Soil Sciences, University of Georgia, Griffin, GA 32223, USA; (J.J.); (M.M.)
Mohamed Mergoum Department of Crop and Soil Sciences, University of Georgia, Griffin, GA 32223, USA; (J.J.); (M.M.)
Richanrd Esten Mason Department of Crop Soil and Environmental Sciences, University of Arkansas, Fayetteville, AR 72701, USA;
Amir M. H. Ibrahim Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA; (A.M.H.I.); (R.S.)
Russel Sutton Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA; (A.M.H.I.); (R.S.)
Carl Griffey School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA;
Md Ali Babar Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.) Correspondence:

Collapse

Guo J, Pradhan S, Shahi D, Khan J, Mcbreen J, Bai G, Murphy JP, Babar MA. Increased Prediction Accuracy Using Combined Genomic Information and Physiological Traits in A Soft Wheat Panel Evaluated in Multi-Environments. Sci Rep 2020;10:7023. [PMID: 32341406 PMCID: PMC7184575 DOI: 10.1038/s41598-020-63919-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 03/11/2020] [Indexed: 12/28/2022] Open

Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits. Nat Commun 2019;10:4872. [PMID: 31653862 PMCID: PMC6814777 DOI: 10.1038/s41467-019-12884-1] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 09/25/2019] [Indexed: 12/11/2022] Open

Obala J, Saxena RK, Singh VK, Kumar CVS, Saxena KB, Tongoona P, Sibiya J, Varshney RK. Development of sequence-based markers for seed protein content in pigeonpea. Mol Genet Genomics 2018;294:57-68. [PMID: 30173295 DOI: 10.1007/s00438-018-1484-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 08/22/2018] [Indexed: 12/30/2022]

Vo NS, Phan V. Leveraging known genomic variants to improve detection of variants, especially close-by Indels. Bioinformatics 2018;34:2918-2926. [PMID: 29590294 DOI: 10.1093/bioinformatics/bty183] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 03/23/2018] [Indexed: 12/30/2022] Open

Abstract

Motivation

The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately.

Results

We introduce a novel approach that leverages known variant information, e.g. provided by dbSNP, dbVar, ExAC or the 1000 Genomes Project, to improve sensitivity of detecting variants, especially close-by Indels. In our approach, the standard reference genome and the known variants are combined to build a meta-reference, which is expected to be probabilistically closer to the subject genomes than the standard reference. An alignment algorithm, which can take into account known variant information, is developed to accurately align reads to the meta-reference. This strategy resulted in accurate alignment and variant calling even with low coverage data. We showed that compared to popular methods such as GATK and SAMtools, our method significantly improves the sensitivity of detecting variants, especially Indels that are close to each other. In particular, our method was able to call these close-by Indels at a 15-20% higher sensitivity than other methods at low coverage, and still get 1-5% higher sensitivity at high coverage, at competitive precision. These results were validated using simulated data with variant profiles extracted from the 1000 Genomes Project data, and real data from the Illumina Platinum Genomes Project and ExAC database. Our finding suggests that by incorporating known variant information in an appropriate manner, sensitive variant calling is possible at a low cost.

Availability and implementation

Implementation can be found in our public code repository https://github.com/namsyvo/IVC.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Nielsen ES, Henriques R, Toonen RJ, Knapp ISS, Guo B, von der Heyden S. Complex signatures of genomic variation of two non-model marine species in a homogeneous environment. BMC Genomics 2018;19:347. [PMID: 29743012 PMCID: PMC5944137 DOI: 10.1186/s12864-018-4721-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 04/23/2018] [Indexed: 12/26/2022] Open

Abstract

BACKGROUND

Genomic tools are increasingly being used on non-model organisms to provide insights into population structure and variability, including signals of selection. However, most studies are carried out in regions with distinct environmental gradients or across large geographical areas, in which local adaptation is expected to occur. Therefore, the focus of this study is to characterize genomic variation and selective signals over short geographic areas within a largely homogeneous region. To assess adaptive signals between microhabitats within the rocky shore, we compared genomic variation between the Cape urchin (Parechinus angulosus), which is a low to mid-shore species, and the Granular limpet (Scutellastra granularis), a high shore specialist.

RESULTS

Using pooled restriction site associated DNA (RAD) sequencing, we described patterns of genomic variation and identified outlier loci in both species. We found relatively low numbers of outlier SNPs within each species, and identified outlier genes associated with different selective pressures than those previously identified in studies conducted over larger environmental gradients. The number of population-specific outlier loci differed between species, likely owing to differential selective pressures within the intertidal environment. Interestingly, the outlier loci were highly differentiated within the two northernmost populations for both species, suggesting that unique evolutionary forces are acting on marine invertebrates within this region.

CONCLUSIONS

Our study provides a background for comparative genomic studies focused on non-model species, as well as a baseline for the adaptive potential of marine invertebrates along the South African west coast. We also discuss the caveats associated with Pool-seq and potential biases of sequencing coverage on downstream genomic metrics. The findings provide evidence of species-specific selective pressures within a homogeneous environment, and suggest that selective forces acting on small scales are just as crucial to acknowledge as those acting on larger scales. As a whole, our findings imply that future population genomic studies should expand from focusing on model organisms and/or studying heterogeneous regions to better understand the evolutionary processes shaping current and future biodiversity patterns, particularly when used in a comparative phylogeographic context.

Collapse

Kouprina N, Liskovykh M, Lee NCO, Noskov VN, Waterfall JJ, Walker RL, Meltzer PS, Topol EJ, Larionov V. Analysis of the 9p21.3 sequence associated with coronary artery disease reveals a tendency for duplication in a CAD patient. Oncotarget 2018;9:15275-15291. [PMID: 29632643 PMCID: PMC5880603 DOI: 10.18632/oncotarget.24567] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 02/10/2018] [Indexed: 11/25/2022] Open

Lu J, Liu Y, Xu J, Mei Z, Shi Y, Liu P, He J, Wang X, Meng Y, Feng S, Shen C, Wang H. High-Density Genetic Map Construction and Stem Total Polysaccharide Content-Related QTL Exploration for Chinese Endemic Dendrobium (Orchidaceae). FRONTIERS IN PLANT SCIENCE 2018;9:398. [PMID: 29636767 PMCID: PMC5880926 DOI: 10.3389/fpls.2018.00398] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 03/12/2018] [Indexed: 05/19/2023]

Affiliation(s)

Jiangjie Lu College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou Normal University, Hangzhou, China State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China *Correspondence: Jiangjie Lu
Yuyang Liu College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou Normal University, Hangzhou, China
Jing Xu Center of Rare Plant Medicine Research of Zhejiang Province, Wuyi, China Zhejiang ShouXianGu Pharmaceutical Co. Ltd., Wuyi, China
Ziwei Mei College of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou, China
Yujun Shi School of Foreign Languages, Zhejiang Gongshang University, Hangzhou, China
Pengli Liu College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou Normal University, Hangzhou, China
Jianbo He Soybean Research Institute, Nanjing Agricultural University, Nanjing, China
Xiaotong Wang Center of Rare Plant Medicine Research of Zhejiang Province, Wuyi, China Zhejiang ShouXianGu Pharmaceutical Co. Ltd., Wuyi, China
Yijun Meng College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou Normal University, Hangzhou, China
Shangguo Feng College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou Normal University, Hangzhou, China
Chenjia Shen College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou Normal University, Hangzhou, China
Huizhong Wang College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou Normal University, Hangzhou, China Huizhong Wang

Collapse

Gupta P, Reddaiah B, Salava H, Upadhyaya P, Tyagi K, Sarma S, Datta S, Malhotra B, Thomas S, Sunkum A, Devulapalli S, Till BJ, Sreelakshmi Y, Sharma R. Next-generation sequencing (NGS)-based identification of induced mutations in a doubly mutagenized tomato (Solanum lycopersicum) population. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2017;92:495-508. [PMID: 28779536 DOI: 10.1111/tpj.13654] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Revised: 07/25/2017] [Accepted: 07/26/2017] [Indexed: 05/21/2023]

Xiao S, Wang P, Dong L, Zhang Y, Han Z, Wang Q, Wang Z. Whole-genome single-nucleotide polymorphism (SNP) marker discovery and association analysis with the eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content in Larimichthys crocea. PeerJ 2016;4:e2664. [PMID: 28028455 PMCID: PMC5180582 DOI: 10.7717/peerj.2664] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 10/07/2016] [Indexed: 12/30/2022] Open

Hoffberg SL, Kieran TJ, Catchen JM, Devault A, Faircloth BC, Mauricio R, Glenn TC. RAD cap: sequence capture of dual‐digest RAD seq libraries with identifiable duplicates and reduced missing data. Mol Ecol Resour 2016;16:1264-78. [DOI: 10.1111/1755-0998.12566] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Revised: 07/06/2016] [Accepted: 07/11/2016] [Indexed: 12/21/2022]

Greenwood JM, Ezquerra AL, Behrens S, Branca A, Mallet L. Current analysis of host–parasite interactions with a focus on next generation sequencing data. ZOOLOGY 2016;119:298-306. [DOI: 10.1016/j.zool.2016.06.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 06/22/2016] [Accepted: 06/22/2016] [Indexed: 01/21/2023]

Shin S, Park J. Characterization of sequence-specific errors in various next-generation sequencing systems. MOLECULAR BIOSYSTEMS 2016;12:914-22. [DOI: 10.1039/c5mb00750j] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Kim W. Transmission Disequilibrium Tests Based on Read Counts for Low-Coverage Next-Generation Sequence Data. Hum Hered 2015;80:36-49. [PMID: 26278553 DOI: 10.1159/000434645] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 05/30/2015] [Indexed: 11/19/2022] Open

Abstract

The purpose of this paper is the introduction of new statistical methods for case-parent trio association studies based on the read counts that can be obtained from next-generation sequencing (NGS) experiments. This work focuses on the inclusion of low-coverage data into the case-parent trio design without genotype classification or imputation. Two different approaches are considered: (1) a likelihood-based approach implementing a 15-component parametric mixture model and (2) a model-free approach that applies non-parametric statistical methods to the ratios of the read counts to coverage. Simulation studies are conducted to evaluate the performances of the proposed tests. In addition, the non-centrality parameters of the mixture likelihood-based tests are derived to determine sample sizes and coverage for a NGS experimental design. As an example, the sample sizes to maintain specified powers of a published adolescent idiopathic scoliosis (AIS) study are presented. The simulation results show that the tests using the genotypes classified by the maximum Bayesian posterior probability have significantly inflated type I error rates for low-coverage data. The tests using the posterior probabilities instead of the classified genotypes show lower power than the proposed tests. Generally, power for the likelihood-based approach is higher than that for the non-parametric ratio-based approach. For the AIS example, approximately 654 trios with 4× coverage are necessary to maintain 90% power when detecting an association of odds ratio 2 at a locus with a minor allele frequency of 0.35 at the level of significance α = 5 × 10(-8). By comparison, approximately 416 trios with 25× coverage are required to maintain the same power with the same settings. The R and C source codes to calculate the proposed test statistics, the sample sizes and power can be obtained by contacting the author (wkim@cau.ac.kr).

Collapse

Wang Y, Liu A, Mills JL, Boehnke M, Wilson AF, Bailey-Wilson JE, Xiong M, Wu CO, Fan R. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol 2015;39:259-75. [PMID: 25809955 PMCID: PMC4443751 DOI: 10.1002/gepi.21895] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 01/28/2015] [Accepted: 01/28/2015] [Indexed: 10/23/2022]

Abstract

In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case.

Collapse

Sampson J, Jacobs K, Yeager M, Chanock S, Chatterjee N. Efficient study design for next generation sequencing. Genet Epidemiol 2015;35:269-77. [PMID: 21370254 DOI: 10.1002/gepi.20575] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Revised: 12/24/2010] [Accepted: 01/12/2011] [Indexed: 01/23/2023]

Bansal V, Libiger O. Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations. BMC Bioinformatics 2015;16:4. [PMID: 25592880 PMCID: PMC4301802 DOI: 10.1186/s12859-014-0418-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 12/10/2014] [Indexed: 01/18/2023] Open

Abstract

Background

Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads.

Results

We describe a fast method for estimating the relative contribution of known reference populations to an individual’s genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling.

Conclusions

Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0418-7) contains supplementary material, which is available to authorized users.

Collapse

Fan R, Wang Y, Mills JL, Carter TC, Lobach I, Wilson AF, Bailey-Wilson JE, Weeks DE, Xiong M. Generalized functional linear models for gene-based case-control association studies. Genet Epidemiol 2014;38:622-637. [PMID: 25203683 PMCID: PMC4189986 DOI: 10.1002/gepi.21840] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 04/29/2014] [Accepted: 05/28/2014] [Indexed: 01/23/2023]

Saad M, Wijsman EM. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees. Genet Epidemiol 2014;38:579-90. [PMID: 25132070 PMCID: PMC4190076 DOI: 10.1002/gepi.21844] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/24/2014] [Accepted: 06/27/2014] [Indexed: 12/27/2022]

Malde K. Estimating the information value of polymorphic sites using pooled sequences. BMC Genomics 2014;15 Suppl 6:S20. [PMID: 25571927 PMCID: PMC4239578 DOI: 10.1186/1471-2164-15-s6-s20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Sun L, Liu S, Wang R, Jiang Y, Zhang Y, Zhang J, Bao L, Kaltenboeck L, Dunham R, Waldbieser G, Liu Z. Identification and analysis of genome-wide SNPs provide insight into signatures of selection and domestication in channel catfish (Ictalurus punctatus). PLoS One 2014;9:e109666. [PMID: 25313648 PMCID: PMC4196944 DOI: 10.1371/journal.pone.0109666] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2014] [Accepted: 09/02/2014] [Indexed: 12/28/2022] Open

Affiliation(s)

Luyang Sun The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Shikai Liu The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Ruijia Wang The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Yanliang Jiang The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Yu Zhang The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Jiaren Zhang The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Lisui Bao The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Ludmilla Kaltenboeck The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Rex Dunham The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America
Geoff Waldbieser USDA-ARS Warmwater Aquaculture Research Unit, Stoneville, Mississippi, United States of America
Zhanjiang Liu The Fish Molecular Genetics and Biotechnology Laboratory, Aquatic Genomics Unit, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University, Auburn, Alabama, United States of America

Collapse

Bianco L, Cestaro A, Sargent DJ, Banchi E, Derdak S, Di Guardo M, Salvi S, Jansen J, Viola R, Gut I, Laurens F, Chagné D, Velasco R, van de Weg E, Troggio M. Development and validation of a 20K single nucleotide polymorphism (SNP) whole genome genotyping array for apple (Malus × domestica Borkh). PLoS One 2014;9:e110377. [PMID: 25303088 PMCID: PMC4193858 DOI: 10.1371/journal.pone.0110377] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 09/12/2014] [Indexed: 01/08/2023] Open

Abstract

High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.

Collapse

Salari R, Saleh SS, Kashef-Haghighi D, Khavari D, Newburger DE, West RB, Sidow A, Batzoglou S. Inference of tumor phylogenies with improved somatic mutation discovery. J Comput Biol 2014;20:933-44. [PMID: 24195709 DOI: 10.1089/cmb.2013.0106] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Liu D, Ma C, Hong W, Huang L, Liu M, Liu H, Zeng H, Deng D, Xin H, Song J, Xu C, Sun X, Hou X, Wang X, Zheng H. Construction and analysis of high-density linkage map using high-throughput sequencing data. PLoS One 2014;9:e98855. [PMID: 24905985 PMCID: PMC4048240 DOI: 10.1371/journal.pone.0098855] [Citation(s) in RCA: 180] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 05/08/2014] [Indexed: 12/31/2022] Open

Fan R, Wang Y, Mills JL, Wilson AF, Bailey-Wilson JE, Xiong M. Functional linear models for association analysis of quantitative traits. Genet Epidemiol 2014;37:726-42. [PMID: 24130119 DOI: 10.1002/gepi.21757] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 07/15/2013] [Accepted: 08/14/2013] [Indexed: 12/19/2022]

Abstract

Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F-distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT-O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT-O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study.

Collapse

Hou Y, Fan W, Yan L, Li R, Lian Y, Huang J, Li J, Xu L, Tang F, Xie XS, Qiao J. Genome analyses of single human oocytes. Cell 2014;155:1492-506. [PMID: 24360273 DOI: 10.1016/j.cell.2013.11.040] [Citation(s) in RCA: 224] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Revised: 10/31/2013] [Accepted: 11/25/2013] [Indexed: 11/16/2022]

Affiliation(s)

Yu Hou Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China
Wei Fan Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Science, Beijing 100084, China
Liying Yan Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China
Rong Li Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China
Ying Lian Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China
Jin Huang Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China
Jinsen Li Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China
Liya Xu Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China
Fuchou Tang Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China; Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China.
X Sunney Xie Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA.
Jie Qiao Biodynamic Optical Imaging Center, College of Life Sciences and Center for Reproductive Medicine, Third Hospital, Peking University, Beijing 100871, China; Key Laboratory of Assisted Reproduction, Ministry of Education and Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing 100191, China.

Collapse

Challenges in the Next-Generation Sequencing Field. NEXT GENERATION SEQUENCING TECHNOLOGIES AND CHALLENGES IN SEQUENCE ASSEMBLY 2014. [DOI: 10.1007/978-1-4939-0715-1_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Durtschi J, Margraf RL, Coonrod EM, Mallempati KC, Voelkerding KV. VarBin, a novel method for classifying true and false positive variants in NGS data. BMC Bioinformatics 2013;14 Suppl 13:S2. [PMID: 24266885 PMCID: PMC3849648 DOI: 10.1186/1471-2105-14-s13-s2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

Abstract

Background

Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction.

Methods

VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4).

Results

To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4.

Conclusions

These data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2.

Collapse

Zeng F, Jiang R, Chen T. PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data. Nucleic Acids Res 2013;41:e136. [PMID: 23700313 PMCID: PMC3711422 DOI: 10.1093/nar/gkt372] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Kang J, Huang KC, Xu Z, Wang Y, Abecasis GR, Li Y. AbCD: arbitrary coverage design for sequencing-based genetic studies. Bioinformatics 2013;29:799-801. [PMID: 23357921 DOI: 10.1093/bioinformatics/btt041] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Wilson MR, Allard MW, Brown EW. The forensic analysis of foodborne bacterial pathogens in the age of whole-genome sequencing. Cladistics 2013;29:449-461. [DOI: 10.1111/cla.12012] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2012] [Indexed: 01/07/2023] Open

Feder AF, Petrov DA, Bergland AO. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One 2012;7:e48588. [PMID: 23152785 PMCID: PMC3494690 DOI: 10.1371/journal.pone.0048588] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/03/2012] [Indexed: 12/14/2022] Open

Faita F, Vecoli C, Foffa I, Andreassi MG. Next generation sequencing in cardiovascular diseases. World J Cardiol 2012;4:288-95. [PMID: 23110245 PMCID: PMC3482622 DOI: 10.4330/wjc.v4.i10.288] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Revised: 09/08/2012] [Accepted: 09/15/2012] [Indexed: 02/06/2023] Open

Zhou B. An empirical Bayes mixture model for SNP detection in pooled sequencing data. Bioinformatics 2012;28:2569-75. [PMID: 22914221 DOI: 10.1093/bioinformatics/bts501] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hasmats J, Green H, Solnestam BW, Zajac P, Huss M, Orear C, Validire P, Bjursell M, Lundeberg J. Validation of whole genome amplification for analysis of the p53 tumor suppressor gene in limited amounts of tumor samples. Biochem Biophys Res Commun 2012;425:379-83. [DOI: 10.1016/j.bbrc.2012.07.101] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 07/19/2012] [Indexed: 10/28/2022]

Zhu Y, Bergland AO, González J, Petrov DA. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS One 2012;7:e41901. [PMID: 22848651 PMCID: PMC3406057 DOI: 10.1371/journal.pone.0041901] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 06/28/2012] [Indexed: 11/26/2022] Open

Flannick J, Korn JM, Fontanillas P, Grant GB, Banks E, Depristo MA, Altshuler D. Efficiency and power as a function of sequence coverage, SNP array density, and imputation. PLoS Comput Biol 2012;8:e1002604. [PMID: 22807667 PMCID: PMC3395607 DOI: 10.1371/journal.pcbi.1002604] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2012] [Accepted: 05/24/2012] [Indexed: 01/19/2023] Open

Abstract

High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling.

In this work we address a series of questions prompted by the rise of next-generation sequencing as a data collection strategy for genetic studies. How does low coverage sequencing compare to traditional microarray based genotyping? Do studies increase sensitivity by collecting both sequencing and array data? What can we learn about technology error modes based on analysis of SNPs for which sequence and array data disagree? To answer these questions, we developed a statistical framework to estimate genotypes from sequence reads, array intensities, and imputation. Through experiments with intensity and read data from the Hapmap and 1000 Genomes (1000 G) Projects, we show that 1 M SNP arrays used for genome wide association studies perform similarly to 1× sequencing. We find that adding low coverage sequence reads to dense array data significantly increases rare variant sensitivity, but adding dense array data to low coverage sequencing has only a small impact. Finally, we describe an improved SNP calling algorithm used in the 1000 G project, inspired by a novel next-generation sequencing error mode identified through analysis of disputed SNPs. These results inform the use of next-generation sequencing in genetic studies and model an approach to further improve genotype calling methods.

Collapse

Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data. STATISTICS IN BIOSCIENCES 2012;5:3-25. [PMID: 24489615 DOI: 10.1007/s12561-012-9067-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Zhou B, Whittemore AS. Improving sequence-based genotype calls with linkage disequilibrium and pedigree information. Ann Appl Stat 2012. [DOI: 10.1214/11-aoas527] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Li M, Stoneking M. A new approach for detecting low-level mutations in next-generation sequence data. Genome Biol 2012;13:R34. [PMID: 22621726 PMCID: PMC3446287 DOI: 10.1186/gb-2012-13-5-r34] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Revised: 05/14/2012] [Accepted: 05/23/2012] [Indexed: 01/01/2023] Open

Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat Genet 2012;44:631-5. [PMID: 22610117 DOI: 10.1038/ng.2283] [Citation(s) in RCA: 177] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2011] [Accepted: 04/16/2012] [Indexed: 12/14/2022]

Gerstung M, Beisel C, Rechsteiner M, Wild P, Schraml P, Moch H, Beerenwinkel N. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat Commun 2012;3:811. [PMID: 22549840 DOI: 10.1038/ncomms1814] [Citation(s) in RCA: 178] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 03/30/2012] [Indexed: 01/06/2023] Open

Determination of RET Sequence Variation in an MEN2 Unaffected Cohort Using Multiple-Sample Pooling and Next-Generation Sequencing. J Thyroid Res 2012;2012:318232. [PMID: 22545224 PMCID: PMC3321559 DOI: 10.1155/2012/318232] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Accepted: 01/23/2012] [Indexed: 11/30/2022] Open

Crawford JE, Lazzaro BP. Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Front Genet 2012;3:66. [PMID: 22536207 PMCID: PMC3334522 DOI: 10.3389/fgene.2012.00066] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Accepted: 04/05/2012] [Indexed: 01/17/2023] Open

Abstract

Next-generation sequencing (NGS) technologies have made it possible to address population genetic questions in almost any system, but high error rates associated with such data can introduce significant biases into downstream analyses, necessitating careful experimental design and interpretation in studies based on short-read sequencing. Exploration of population genetic analyses based on NGS has revealed some of the potential biases, but previous work has emphasized parameters relevant to human population genetics and further examination of parameters relevant to other systems is necessary, including situations where sample sizes are small and genetic variation is high. To assess experimental power to address several principal objectives of population genetic studies under these conditions, we simulated population samples under selective sweep, population growth, and population subdivision models and tested the power to accurately infer population genetic parameters from sequence polymorphism data obtained through simulated 4×, 8×, and 15× read depth sequence data. We found that estimates of population genetic differentiation and population growth parameters were systematically biased when inference was based on 4× sequencing, but biases were markedly reduced at even 8× read depth. We also found that the power to identify footprints of positive selection depends on an interaction between read depth and the strength of selection, with strong selection being recovered consistently at all read depths, but weak selection requiring deeper read depths for reliable detection. Although we have explored only a small subset of the many possible experimental designs and population genetic models, using only one SNP-calling approach, our results reveal some general patterns and provide some assessment of what biases could be expected under similar experimental structures.

Collapse

Rossetti S, Hopp K, Sikkink RA, Sundsbak JL, Lee YK, Kubly V, Eckloff BW, Ward CJ, Winearls CG, Torres VE, Harris PC. Identification of gene mutations in autosomal dominant polycystic kidney disease through targeted resequencing. J Am Soc Nephrol 2012;23:915-33. [PMID: 22383692 DOI: 10.1681/asn.2011101032] [Citation(s) in RCA: 127] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open