1
|
Obry L, Dalmasso C. Weighted multiple testing procedures in genome-wide association studies. PeerJ 2023; 11:e15369. [PMID: 37337586 PMCID: PMC10276986 DOI: 10.7717/peerj.15369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 04/17/2023] [Indexed: 06/21/2023] Open
Abstract
Multiple testing procedures controlling the false discovery rate (FDR) are increasingly used in the context of genome wide association studies (GWAS), and weighted multiple testing procedures that incorporate covariate information are efficient to improve the power to detect associations. In this work, we evaluate some recent weighted multiple testing procedures in the specific context of GWAS through a simulation study. We also present a new efficient procedure called wBHa that prioritizes the detection of genetic variants with low minor allele frequencies while maximizing the overall detection power. The results indicate good performance of our procedure compared to other weighted multiple testing procedures. In particular, in all simulated settings, wBHa tends to outperform other procedures in detecting rare variants while maintaining good overall power. The use of the different procedures is illustrated with a real dataset.
Collapse
Affiliation(s)
- Ludivine Obry
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| | - Cyril Dalmasso
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| |
Collapse
|
2
|
Luo X, Cai G, Mclain AC, Amos CI, Cai B, Xiao F. BMI-CNV: a Bayesian framework for multiple genotyping platforms detection of copy number variants. Genetics 2022; 222:iyac147. [PMID: 36171678 PMCID: PMC9713397 DOI: 10.1093/genetics/iyac147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/08/2022] [Indexed: 12/13/2022] Open
Abstract
Whole-exome sequencing (WES) enables the detection of copy number variants (CNVs) with high resolution in protein-coding regions. However, variants in the intergenic or intragenic regions are excluded from studies. Fortunately, many of these samples have been previously sequenced by other genotyping platforms which are sparse but cover a wide range of genomic regions, such as SNP array. Moreover, conventional single sample-based methods suffer from a high false discovery rate due to prominent data noise. Therefore, methods for integrating multiple genotyping platforms and multiple samples are highly demanded for improved copy number variant detection. We developed BMI-CNV, a Bayesian Multisample and Integrative CNV (BMI-CNV) profiling method with data sequenced by both whole-exome sequencing and microarray. For the multisample integration, we identify the shared copy number variants regions across samples using a Bayesian probit stick-breaking process model coupled with a Gaussian Mixture model estimation. With extensive simulations, BMI-copy number variant outperformed existing methods with improved accuracy. In the matched data from the 1000 Genomes Project and HapMap project data, BMI-CNV also accurately detected common variants and significantly enlarged the detection spectrum of whole-exome sequencing. Further application to the data from The Research of International Cancer of Lung consortium (TRICL) identified lung cancer risk variant candidates in 17q11.2, 1p36.12, 8q23.1, and 5q22.2 regions.
Collapse
Affiliation(s)
- Xizhi Luo
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Guoshuai Cai
- Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Alexander C Mclain
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Christopher I Amos
- Department of Quantitative Sciences, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bo Cai
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Feifei Xiao
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| |
Collapse
|
3
|
Fisch ATM, Eckley IA, Fearnhead P. A linear time method for the detection of collective and point anomalies. Stat Anal Data Min 2022. [DOI: 10.1002/sam.11586] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
| | - Idris A. Eckley
- Department of Mathematics and Statistics Lancaster University Lancaster UK
| | - Paul Fearnhead
- Department of Mathematics and Statistics Lancaster University Lancaster UK
| |
Collapse
|
4
|
Hahn G. Online multivariate changepoint detection with type I error control and constant time/memory updates per series. Stat Probab Lett 2022. [DOI: 10.1016/j.spl.2021.109258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
5
|
Affiliation(s)
- Vladimir Vovk
- Vladimir Vovk is Professor of Computer Science, Department of Computer Science, Royal Holloway, University of London, Egham, Surrey TW20 0EX, United Kingdom
| |
Collapse
|
6
|
Eckley I, Kirch C, Weber S. A novel change-point approach for the detection of gas emission sources using remotely contained concentration data. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Affiliation(s)
- Yanhong Wu
- Department of Mathematics, California State University Stanislaus, Turlock, California, USA
| |
Collapse
|
8
|
Multiple testing with the structure‐adaptive Benjamini–Hochberg algorithm. J R Stat Soc Series B Stat Methodol 2018. [DOI: 10.1111/rssb.12298] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
9
|
Mardia KV, Sriram K, Deane CM. A statistical model for helices with applications. Biometrics 2018; 74:845-854. [PMID: 29569225 DOI: 10.1111/biom.12870] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 01/01/2018] [Accepted: 01/01/2018] [Indexed: 11/28/2022]
Abstract
Motivated by a cutting edge problem related to the shape of α -helices in proteins, we formulate a parametric statistical model, which incorporates the cylindrical nature of the helix. Our focus is to detect a "kink," which is a drastic change in the axial direction of the helix. We propose a statistical model for the straight α -helix and derive the maximum likelihood estimation procedure. The cylinder is an accepted geometric model for α -helices, but our statistical formulation, for the first time, quantifies the uncertainty in atom positions around the cylinder. We propose a change point technique "Kink-Detector" to detect a kink location along the helix. Unlike classical change point problems, the change in direction of a helix depends on a simultaneous shift of multiple data points rather than a single data point, and is less straightforward. Our biological building block is crowdsourced data on straight and kinked helices; which has set a gold standard. We use this data to identify salient features to construct Kink-detector, test its performance and gain some insights. We find the performance of Kink-detector comparable to its computational competitor called "Kink-Finder." We highlight that identification of kinks by visual assessment can have limitations and Kink-detector may help in such cases. Further, an analysis of crowdsourced curved α -helices finds that Kink-detector is also effective in detecting moderate changes in axial directions.
Collapse
Affiliation(s)
- Kanti V. Mardia
- Department of Statistics; University of Oxford; Oxford UK
- Department of Statistics; School of Mathematics; University of Leeds; Leeds UK
| | - Karthik Sriram
- Quantitative Methods area; Indian Institute of Management; Ahmedabad Gujarat India
| | | |
Collapse
|
10
|
|
11
|
Cao Y, Xie L, Xie Y, Xu H. Sequential Change-Point Detection via Online Convex Optimization. ENTROPY 2018; 20:e20020108. [PMID: 33265199 PMCID: PMC7512601 DOI: 10.3390/e20020108] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 12/02/2017] [Accepted: 02/05/2018] [Indexed: 11/16/2022]
Abstract
Sequential change-point detection when the distribution parameters are unknown is a fundamental problem in statistics and machine learning. When the post-change parameters are unknown, we consider a set of detection procedures based on sequential likelihood ratios with non-anticipating estimators constructed using online convex optimization algorithms such as online mirror descent, which provides a more versatile approach to tackling complex situations where recursive maximum likelihood estimators cannot be found. When the underlying distributions belong to a exponential family and the estimators satisfy the logarithm regret property, we show that this approach is nearly second-order asymptotically optimal. This means that the upper bound for the false alarm rate of the algorithm (measured by the average-run-length) meets the lower bound asymptotically up to a log-log factor when the threshold tends to infinity. Our proof is achieved by making a connection between sequential change-point and online convex optimization and leveraging the logarithmic regret bound property of online mirror descent algorithm. Numerical and real data examples validate our theory.
Collapse
|
12
|
|
13
|
Fan Z, Mackey L. Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences. Ann Appl Stat 2017. [DOI: 10.1214/17-aoas1075] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Song C, Min X, Zhang H. THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES. Ann Appl Stat 2017; 10:2102-2129. [PMID: 28090239 DOI: 10.1214/16-aoas966] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better.
Collapse
|
15
|
Zhang NR, Yakir B, Xia LC, Siegmund D. Scan statistics on Poisson random fields with applications in genomics. Ann Appl Stat 2016. [DOI: 10.1214/15-aoas892] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
16
|
Hu J, Zhang L, Wang HJ. Sequential model selection-based segmentation to detect DNA copy number variation. Biometrics 2016; 72:815-26. [PMID: 26954760 DOI: 10.1111/biom.12478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 08/01/2015] [Accepted: 09/01/2015] [Indexed: 12/16/2022]
Abstract
Array-based CGH experiments are designed to detect genomic aberrations or regions of DNA copy-number variation that are associated with an outcome, typically a state of disease. Most of the existing statistical methods target on detecting DNA copy number variations in a single sample or array. We focus on the detection of group effect variation, through simultaneous study of multiple samples from multiple groups. Rather than using direct segmentation or smoothing techniques, as commonly seen in existing detection methods, we develop a sequential model selection procedure that is guided by a modified Bayesian information criterion. This approach improves detection accuracy by accumulatively utilizing information across contiguous clones, and has computational advantage over the existing popular detection methods. Our empirical investigation suggests that the performance of the proposed method is superior to that of the existing detection methods, in particular, in detecting small segments or separating neighboring segments with differential degrees of copy-number variation.
Collapse
Affiliation(s)
- Jianhua Hu
- Department of Biostatistics, UT M. D. Anderson Cancer Center, Houston, Texas 77030, U.S.A..
| | - Liwen Zhang
- School of Economics, Shanghai University, Shanghai 200444, China.
| | - Huixia Judy Wang
- Department of Statistics, George Washington University, Washington D.C. 20052, U.S.A..
| |
Collapse
|
17
|
Walter V, Wright FA, Nobel AB. Consistent testing for recurrent genomic aberrations. Biometrika 2015; 102:783-796. [PMID: 30799871 DOI: 10.1093/biomet/asv046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We consider the detection and identification of recurrent departures from stationary behaviour in genomic or similarly arranged data containing measurements at an ordered set of variables. Our primary focus is on departures that occur only at a single variable, or within a small window of contiguous variables, but involve more than one sample. This encompasses the identification of aberrant markers in genome-wide measurements of DNA copy number and DNA methylation, as well as meta-analyses of genome-wide association studies. We propose and analyse a cyclic shift-based procedure for testing recurrent departures from stationarity. Our analysis establishes the consistency of cyclic shift [Formula: see text]-values for datasets with a fixed set of samples as the number of observed variables tends to infinity, under the assumption that each sample is an independent realization of a stationary Markov chain. Our results apply to any test statistic satisfying a simple invariance condition.
Collapse
Affiliation(s)
- V Walter
- Department of Biochemistry and Molecular Biology, Pennyslvania State University College of Medicine, Milton S. Hershey Medical Center, 500 University Drive, P.O. Box 850, Hershey, Pennsylvania 17033 U.S.A
| | - F A Wright
- Department of Statistics, North Carolina State University Bioinformatics Research Center, Campus Box 7566, 2601 Stinson Drive, Raleigh, North Carolina 27695 U.S.A.,
| | - A B Nobel
- Department of Statistics and Operations Research, CB 3260, University of North Carolina, Chapel Hill, North Carolina, 27599 U.S.A.,
| |
Collapse
|
18
|
Hua X, Goedert JJ, Pu A, Yu G, Shi J. Allergy associations with the adult fecal microbiota: Analysis of the American Gut Project. EBioMedicine 2015; 3:172-179. [PMID: 26870828 PMCID: PMC4739432 DOI: 10.1016/j.ebiom.2015.11.038] [Citation(s) in RCA: 134] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Revised: 11/18/2015] [Accepted: 11/23/2015] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Alteration of the gut microbial population (dysbiosis) may increase the risk for allergies and other conditions. This study sought to clarify the relationship of dysbiosis with allergies in adults. METHODS Publicly available American Gut Project questionnaire and fecal 16S rRNA sequence data were analyzed. Fecal microbiota richness (number of observed species) and composition (UniFrac) were used to compare adults with versus without allergy to foods (peanuts, tree nuts, shellfish, other) and non-foods (drug, bee sting, dander, asthma, seasonal, eczema). Logistic and Poisson regression models adjusted for potential confounders. Odds ratios and 95% confidence intervals (CI) were calculated for lowest vs highest richness tertile. Taxonomy associations considered 122 non-redundant taxa (of 2379 total taxa) with ≥ 0.1% mean abundance. RESULTS Self-reported allergy prevalence among the 1879 participants (mean age, 45.5 years; 46.9% male) was 81.5%, ranging from 2.5% for peanuts to 40.5% for seasonal. Fecal microbiota richness was markedly lower with total allergies (P = 10(-9)) and five particular allergies (P ≤ 10(-4)). Richness odds ratios were 1.7 (CI 1.3-2.2) with seasonal, 1.8 (CI 1.3-2.5) with drug, and 7.8 (CI 2.3-26.5) with peanut allergy. These allergic participants also had markedly altered microbial community composition (unweighted UniFrac, P = 10(-4) to 10(-7)). Total food and non-food allergies were significantly associated with 7 and 9 altered taxa, respectively. The dysbiosis was most marked with nut and seasonal allergies, driven by higher Bacteroidales and reduced Clostridiales taxa. INTERPRETATION American adults with allergies, especially to nuts and seasonal pollen, have low diversity, reduced Clostridiales, and increased Bacteroidales in their gut microbiota. This dysbiosis might be targeted to improve treatment or prevention of allergy.
Collapse
Affiliation(s)
- Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - James J Goedert
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Angela Pu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Guoqin Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
19
|
|
20
|
Zhao B, Glaz J. Scan Statistics for Detecting a Local Change in Variance for Normal Data with Known Variance. Methodol Comput Appl Probab 2015. [DOI: 10.1007/s11009-015-9465-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
21
|
Jeng J, Wu Q, Li H. A Statistical Method for Identifying Trait-Associated Copy Number Variants. Hum Hered 2015. [PMID: 26201700 DOI: 10.1159/000381585] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Copy number variants (CNVs), ranging in size from about one kilobase to several megabases, are DNA alterations of a genome that result in the cell having less or more than two copies of segments of the DNA. Such CNVs have been shown to be associated with many complex phenotypes, ranging from diseases to gene expressions. Novel methods have been developed for identifying CNVs both at the individual and at the population level. However, methods for testing CNV association are limited. Most available methods employ a two-step approach, where CNVs carried by the samples are identified first and then tested for association. However, the results of such tests depend on the threshold used for CNV identification and also the number of CNVs to be tested. We developed a method, CNVtest, to directly identify the trait-associated CNVs without the need of identifying sample-specific CNVs. We show that CNVtest asymptotically controls the type I error rate and identifies true trait-associated CNVs with a high probability. We demonstrate the methods using simulations and an application to identify the CNVs that are associated with population differentiation.
Collapse
Affiliation(s)
- Jessie Jeng
- Department of Statistics, North Carolina State University, Raleigh, N.C., USA
| | | | | |
Collapse
|
22
|
|
23
|
|
24
|
Identifying localized changes in large systems: Change-point detection for biomolecular simulations. Proc Natl Acad Sci U S A 2015; 112:7454-9. [PMID: 26025225 DOI: 10.1073/pnas.1415846112] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Research on change-point detection, the classical problem of detecting abrupt changes in sequential data, has focused predominantly on datasets with a single observable. A growing number of time series datasets, however, involve many observables, often with the property that a given change typically affects only a few of the observables. We introduce a general statistical method that, given many noisy observables, detects points in time at which various subsets of the observables exhibit simultaneous changes in data distribution and explicitly identifies those subsets. Our work is motivated by the problem of identifying the nature and timing of biologically interesting conformational changes that occur during atomic-level simulations of biomolecules such as proteins. This problem has proved challenging both because each such conformational change might involve only a small region of the molecule and because these changes are often subtle relative to the ever-present background of faster structural fluctuations. We show that our method is effective in detecting biologically interesting conformational changes in molecular dynamics simulations of both folded and unfolded proteins, even in cases where these changes are difficult to detect using alternative techniques. This method may also facilitate the detection of change points in other types of sequential data involving large numbers of observables--a problem likely to become increasingly important as such data continue to proliferate in a variety of application domains.
Collapse
|
25
|
|
26
|
Shi J, Yang XR, Caporaso NE, Landi MT, Li P. VTET: a variable threshold exact test for identifying disease-associated copy number variations enriched in short genomic regions. Front Genet 2014; 5:53. [PMID: 24672538 PMCID: PMC3957064 DOI: 10.3389/fgene.2014.00053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 02/27/2014] [Indexed: 11/13/2022] Open
Abstract
Copy number variations (CNVs) constitute a major source of genetic variations in human populations and have been reported to be associated with complex diseases. Methods have been developed for detecting CNVs and testing CNV associations in genome-wide association studies (GWAS) based on SNP arrays. Commonly used two-step testing procedures work well only for long CNVs while direct CNV association testing methods work only for recurrent CNVs. Assuming that short CNVs disrupting any part of a given genomic region increase disease risk, we developed a variable threshold exact test (VTET) for testing disease associations of CNVs randomly distributed in the genome using intensity data from SNP arrays. By extensive simulations, we found that VTET outperformed two-step testing procedures based on existing CNV calling algorithms for short CNVs and that the performance of VTET was robust to the length of the genomic region. In addition, VTET had a comparable performance with CNVtools for testing the association of recurrent CNVs. Thus, we expect VTET to be useful for testing disease associations of both recurrent and randomly distributed CNVs using existing GWAS data. We applied VTET to a lung cancer GWAS and identified a genome-wide significant region on chromosome 18q22.3 for lung squamous cell carcinoma.
Collapse
Affiliation(s)
- Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health Bethesda, MD, USA
| | - Xiaohong R Yang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health Bethesda, MD, USA
| | - Neil E Caporaso
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health Bethesda, MD, USA
| | - Maria T Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health Bethesda, MD, USA
| | - Peng Li
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health Bethesda, MD, USA
| |
Collapse
|
27
|
Shi J, Marconett CN, Duan J, Hyland PL, Li P, Wang Z, Wheeler W, Zhou B, Campan M, Lee DS, Huang J, Zhou W, Triche T, Amundadottir L, Warner A, Hutchinson A, Chen PH, Chung BSI, Pesatori AC, Consonni D, Bertazzi PA, Bergen AW, Freedman M, Siegmund KD, Berman BP, Borok Z, Chatterjee N, Tucker MA, Caporaso NE, Chanock SJ, Laird-Offringa IA, Landi MT. Characterizing the genetic basis of methylome diversity in histologically normal human lung tissue. Nat Commun 2014; 5:3365. [PMID: 24572595 PMCID: PMC3982882 DOI: 10.1038/ncomms4365] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Accepted: 01/31/2014] [Indexed: 12/17/2022] Open
Abstract
The genetic regulation of the human epigenome is not fully appreciated. Here we describe the effects of genetic variants on the DNA methylome in human lung based on methylation-quantitative trait loci (meQTL) analyses. We report 34,304 cis- and 585 trans-meQTLs, a genetic-epigenetic interaction of surprising magnitude, including a regulatory hotspot. These findings are replicated in both breast and kidney tissues and show distinct patterns: cis-meQTLs mostly localize to CpG sites outside of genes, promoters and CpG islands (CGIs), while trans-meQTLs are over-represented in promoter CGIs. meQTL SNPs are enriched in CTCF-binding sites, DNaseI hypersensitivity regions and histone marks. Importantly, four of the five established lung cancer risk loci in European ancestry are cis-meQTLs and, in aggregate, cis-meQTLs are enriched for lung cancer risk in a genome-wide analysis of 11,587 subjects. Thus, inherited genetic variation may affect lung carcinogenesis by regulating the human methylome.
Collapse
Affiliation(s)
- Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Crystal N Marconett
- 1] Department of Surgery, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA [2] Department of Biochemistry and Molecular Biology, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA
| | - Jubao Duan
- Center for Psychiatric Genetics, Department of Psychiatry and Behavioral Sciences, North Shore University Health System Research Institute, University of Chicago Pritzker School of Medicine, Evanston, Illinois 60201, USA
| | - Paula L Hyland
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Peng Li
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Zhaoming Wang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - William Wheeler
- Information Management Services Inc., Rockville, Maryland 20852, USA
| | - Beiyun Zhou
- Will Rogers Institute Pulmonary Research Center, Division of Pulmonary, Critical Care and Sleep Medicine, USC Keck School of Medicine, Los Angeles, California 90089, USA
| | - Mihaela Campan
- 1] Department of Surgery, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA [2] Department of Biochemistry and Molecular Biology, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA
| | - Diane S Lee
- 1] Department of Surgery, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA [2] Department of Biochemistry and Molecular Biology, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA
| | - Jing Huang
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Weiyin Zhou
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Tim Triche
- Bioinformatics Division, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90089, USA
| | - Laufey Amundadottir
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Andrew Warner
- Pathology/Histotechnology Laboratory, Laboratory Animal Sciences Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, USA
| | - Amy Hutchinson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Po-Han Chen
- 1] Department of Surgery, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA [2] Department of Biochemistry and Molecular Biology, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA
| | - Brian S I Chung
- 1] Department of Surgery, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA [2] Department of Biochemistry and Molecular Biology, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA
| | - Angela C Pesatori
- Unit of Epidemiology, IRCCS Fondazione Ca' Granda Ospedale Maggiore Policlinico, Department of Clinical Sciences and Community Health, University of Milan, Milan 20122, Italy
| | - Dario Consonni
- Unit of Epidemiology, IRCCS Fondazione Ca' Granda Ospedale Maggiore Policlinico, Department of Clinical Sciences and Community Health, University of Milan, Milan 20122, Italy
| | - Pier Alberto Bertazzi
- Unit of Epidemiology, IRCCS Fondazione Ca' Granda Ospedale Maggiore Policlinico, Department of Clinical Sciences and Community Health, University of Milan, Milan 20122, Italy
| | - Andrew W Bergen
- Molecular Genetics Program, Center for Health Sciences, SRI, Menlo Park, California 94025, USA
| | - Mathew Freedman
- 1] Program in Medical and Population Genetics, The Broad Institute, Cambridge, Massachusetts 02142, USA [2] Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Kimberly D Siegmund
- Bioinformatics Division, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90089, USA
| | - Benjamin P Berman
- 1] Bioinformatics Division, Department of Preventive Medicine, University of Southern California, Los Angeles, California 90089, USA [2] USC Epigenome Center and USC/Norris Comprehensive Cancer Center, Los Angeles, California 90089, USA
| | - Zea Borok
- 1] Department of Biochemistry and Molecular Biology, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA [2] Will Rogers Institute Pulmonary Research Center, Division of Pulmonary, Critical Care and Sleep Medicine, USC Keck School of Medicine, Los Angeles, California 90089, USA
| | - Nilanjan Chatterjee
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Margaret A Tucker
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Neil E Caporaso
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| | - Ite A Laird-Offringa
- 1] Department of Surgery, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA [2] Department of Biochemistry and Molecular Biology, USC/Norris Comprehensive Cancer Center, Keck School of Medicine, Los Angeles, California 90089, USA
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA
| |
Collapse
|
28
|
|
29
|
|
30
|
Chan HP, Lai TL. Discussion on “Change-Points: From Sequential Detection to Biology and Back” by David Siegmund. Seq Anal 2013. [DOI: 10.1080/07474946.2013.751840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
31
|
|
32
|
Jiang H, Salzman J. Statistical properties of an early stopping rule for resampling-based multiple testing. Biometrika 2012; 99:973-980. [PMID: 23843675 DOI: 10.1093/biomet/ass051] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures.
Collapse
Affiliation(s)
- Hui Jiang
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109, U.S.A. ,
| | | |
Collapse
|
33
|
Shi J, Li P. An integrative segmentation method for detecting germline copy number variations in SNP arrays. Genet Epidemiol 2012; 36:373-83. [PMID: 22539397 DOI: 10.1002/gepi.21631] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Germline copy number variations (CNVs) are a major source of genetic variation in humans. In large-scale studies of complex diseases, CNVs are usually detected from data generated by single nucleotide polymorphism (SNP) genotyping arrays. In this paper, we develop an integrative segmentation method, SegCNV, for detecting CNVs integrating both log R ratio (LRR) and B allele frequency (BAF). Based on simulation studies, SegCNV had modestly better power to detect deletions and substantially better power to detect duplications compared with circular binary segmentation (CBS) that relies purely on LRRs; and it had better power to detect deletions and a comparable performance to detect duplications compared with PennCNV and QuantiSNP. In two Hapmap subjects with deep sequence data available as a gold standard, SegCNV detected more true short deletions than PennCNV and QuantiSNP. For 21 short duplications validated experimentally in the AGRE dataset, SegCNV, QuantiSNP, and PennCNV detected all of them while CBS detected only three. SegCNV is much faster than the HMM-based (where HMM is hidden Markov model) methods, taking only several seconds to analyze genome-wide data for one subject.
Collapse
Affiliation(s)
- Jianxin Shi
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland 20854, USA.
| | | |
Collapse
|
34
|
Zhang Z, Lange K, Sabatti C. Reconstructing DNA copy number by joint segmentation of multiple sequences. BMC Bioinformatics 2012; 13:205. [PMID: 22897923 PMCID: PMC3534631 DOI: 10.1186/1471-2105-13-205] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2012] [Accepted: 07/27/2012] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. RESULTS We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions. GFL is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets. CONCLUSIONS The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets.
Collapse
Affiliation(s)
- Zhongyang Zhang
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Kenneth Lange
- Department of Human Genetics, Biomathematics and Statistics, University of California, Los Angeles, CA, USA
| | - Chiara Sabatti
- Department of Health Research and Policy and Statistics, Stanford University, Stanford, CA, USA
| |
Collapse
|
35
|
Shen JJ, Zhang NR. Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing. Ann Appl Stat 2012. [DOI: 10.1214/11-aoas517] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
36
|
|