26
|
Li Q, Brown JB, Huang H, Bickel PJ. Measuring reproducibility of high-throughput experiments. Ann Appl Stat 2011. [DOI: 10.1214/11-aoas466] [Citation(s) in RCA: 627] [Impact Index Per Article: 48.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
27
|
Bickel PJ, Gel YR. Banded regularization of autocovariance matrices in application to parameter estimation and forecasting of time series. J R Stat Soc Series B Stat Methodol 2011. [DOI: 10.1111/j.1467-9868.2011.00779.x] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
28
|
Hoskins RA, Landolin JM, Brown JB, Sandler JE, Takahashi H, Lassmann T, Yu C, Booth BW, Zhang D, Wan KH, Yang L, Boley N, Andrews J, Kaufman TC, Graveley BR, Bickel PJ, Carninci P, Carlson JW, Celniker SE. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res 2010; 21:182-92. [PMID: 21177961 DOI: 10.1101/gr.112466.110] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Core promoters are critical regions for gene regulation in higher eukaryotes. However, the boundaries of promoter regions, the relative rates of initiation at the transcription start sites (TSSs) distributed within them, and the functional significance of promoter architecture remain poorly understood. We produced a high-resolution map of promoters active in the Drosophila melanogaster embryo by integrating data from three independent and complementary methods: 21 million cap analysis of gene expression (CAGE) tags, 1.2 million RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE) reads, and 50,000 cap-trapped expressed sequence tags (ESTs). We defined 12,454 promoters of 8037 genes. Our analysis indicates that, due to non-promoter-associated RNA background signal, previous studies have likely overestimated the number of promoter-associated CAGE clusters by fivefold. We show that TSS distributions form a complex continuum of shapes, and that promoters active in the embryo and adult have highly similar shapes in 95% of cases. This suggests that these distributions are generally determined by static elements such as local DNA sequence and are not modulated by dynamic signals such as histone modifications. Transcription factor binding motifs are differentially enriched as a function of promoter shape, and peaked promoter shape is correlated with both temporal and spatial regulation of gene expression. Our results contribute to the emerging view that core promoters are functionally diverse and control patterning of gene expression in Drosophila and mammals.
Collapse
|
29
|
Bickel PJ. Leo Breiman: An important intellectual and personal force in statistics, my life and that of many others. Ann Appl Stat 2010. [DOI: 10.1214/10-aoas404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
30
|
Bickel PJ, Boley N, Brown JB, Huang H, Zhang NR. Subsampling methods for genomic inference. Ann Appl Stat 2010. [DOI: 10.1214/10-aoas363] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
31
|
Abstract
Examination of aggregate data on graduate admissions to the University of California, Berkeley, for fall 1973 shows a clear but misleading pattern of bias against female applicants. Examination of the disaggregated data reveals few decision-making units that show statistically significant departures from expected frequencies of female admissions, and about as many units appear to favor women as to favor men. If the data are properly pooled, taking into account the autonomy of departmental decision making, thus correcting for the tendency of women to apply to graduate departments that are more difficult for applicants of either sex to enter, there is a small but statistically significant bias in favor of women. The graduate departments that are easier to enter tend to be those that require more mathematics in the undergraduate preparatory curriculum. The bias in the aggregated data stems not from any pattern of discrimination on the part of admissions committees, which seem quite fair on the whole, but apparently from prior screening at earlier levels of the educational system. Women are shunted by their socialization and education toward fields of graduate study that are generally more crowded, less productive of completed degrees, and less well funded, and that frequently offer poorer professional employment prospects.
Collapse
|
32
|
|
33
|
Bickel PJ, Brown JB, Huang H, Li Q. An overview of recent developments in genomics and associated statistical methods. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2009; 367:4313-37. [PMID: 19805447 DOI: 10.1098/rsta.2009.0164] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
The landscape of genomics has changed drastically in the last two decades. Increasingly inexpensive sequencing has shifted the primary focus from the acquisition of biological sequences to the study of biological function. Assays have been developed to study many intricacies of biological systems, and publicly available databases have given rise to integrative analyses that combine information from many sources to draw complex conclusions. Such research was the focus of the recent workshop at the Isaac Newton Institute, 'High dimensional statistics in biology'. Many computational methods from modern genomics and related disciplines were presented and discussed. Using, as much as possible, the material from these talks, we give an overview of modern genomics: from the essential assays that make data-generation possible, to the statistical methods that yield meaningful inference. We point to current analytical challenges, where novel methods, or novel applications of extant methods, are presently needed.
Collapse
|
34
|
|
35
|
|
36
|
Bickel PJ, Ritov Y. Discussion of: Treelets—An adaptive multi-scale basis for sparse unordered data. Ann Appl Stat 2008. [DOI: 10.1214/08-aoas137b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
37
|
|
38
|
Rothman AJ, Bickel PJ, Levina E, Zhu J. Sparse permutation invariant covariance estimation. Electron J Stat 2008. [DOI: 10.1214/08-ejs176] [Citation(s) in RCA: 448] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
39
|
Bickel PJ. Discussion: The Dantzig selector: Statistical estimation when p is much larger than n. Ann Stat 2007. [DOI: 10.1214/009053607000000424] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
40
|
|
41
|
|
42
|
|
43
|
Kechris KJ, Lin JC, Bickel PJ, Glazer AN. Quantitative exploration of the occurrence of lateral gene transfer by using nitrogen fixation genes as a case study. Proc Natl Acad Sci U S A 2006; 103:9584-9. [PMID: 16769896 PMCID: PMC1480450 DOI: 10.1073/pnas.0603534103] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Lateral gene transfer (LGT) is now accepted as an important factor in the evolution of prokaryotes. Establishment of the occurrence of LGT is typically attempted by a variety of methods that includes the comparison of reconstructed phylogenetic trees, the search for unusual GC composition or codon usage within a genome, and identification of similarities between distant species as determined by best blast hits. We explore quantitative assessments of these strategies to study the prokaryotic trait of nitrogen fixation, the enzyme-catalyzed reduction of N(2) to ammonia. Phylogenies constructed on nitrogen fixation genes are not in agreement with the tree-of-life based on 16S rRNA but do not conclusively distinguish between gene loss and LGT hypotheses. Using a series of analyses on a set of complete genomes, our results distinguish two structurally distinct classes of MoFe nitrogenases whose distribution cuts across lines of vertical inheritance and makes us believe that a conclusive case for LGT has been made.
Collapse
|
44
|
van Zwet EW, Kechris KJ, Bickel PJ, Eisen MB. Estimating motifs under order restrictions. Stat Appl Genet Mol Biol 2006; 4:Article1. [PMID: 16646826 DOI: 10.2202/1544-6115.1100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Transcription factors and many other DNA-binding proteins recognize more than one specific sequence. Among sequences recognized by a given DNA-binding protein, different positions exhibit varying degrees of conservation. The reason is that base pairs that are more extensively contacted by the protein tend to be more conserved. This observation can be used in the discovery of transcription factor binding sites. Here we present a rigorous means to accomplish this. In particular, we constrain the order of the information (entropy) in the columns of the position specific weight matrix (PWM) which characterizes the motif being sought. We then show how to compute the maximum likelihood estimate of a PWM under such order restrictions. This computation is easily integrated with the EM algorithm or the Gibbs sampler to enhance performance in the search for motifs in unaligned sequences. We demonstrate our method on a well-known data set of binding sites of the transcription factor Crp in E. coli.
Collapse
|
45
|
Bickel PJ, Ritov Y, Stoker TM. Tailor-made tests for goodness of fit to semiparametric hypotheses. Ann Stat 2006. [DOI: 10.1214/009053606000000137] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
46
|
Bickel PJ, Levina E. Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations. BERNOULLI 2004. [DOI: 10.3150/bj/1106314847] [Citation(s) in RCA: 332] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
47
|
Ge Z, J. Bickel P, A. Rice J. An approximate likelihood approach to nonlinear mixed effects models via spline approximation. Comput Stat Data Anal 2004. [DOI: 10.1016/j.csda.2003.10.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
48
|
Kechris KJ, van Zwet E, Bickel PJ, Eisen MB. Detecting DNA regulatory motifs by incorporating positional trends in information content. Genome Biol 2004; 5:R50. [PMID: 15239835 PMCID: PMC463320 DOI: 10.1186/gb-2004-5-7-r50] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2004] [Revised: 05/04/2004] [Accepted: 05/04/2004] [Indexed: 11/10/2022] Open
Abstract
On the basis of the observation that conserved positions in transcription factor binding sites are often clustered together, we propose a simple extension to the model-based motif discovery methods. We assign position-specific prior distributions to the frequency parameters of the model, penalizing deviations from a specified conservation profile. Examples with both simulated and real data show that this extension helps discover motifs as the data become noisier or when there is a competing false motif.
Collapse
|
49
|
Bartlett PL, Bickel PJ, Bühlmann P, Freund Y, Friedman J, Hastie T, Jiang W, Jordan MJ, Koltchinskii V, Lugosi G, McAuliffe JD, Ritov Y, Rosset S, Schapire RE, Tibshirani R, Vayatis N, Yu B, Zhang T, Zhu J. Discussions of boosting papers, and rejoinders. Ann Stat 2004. [DOI: 10.1214/aos/1105988581] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
50
|
|