1
|
Durand G, Blanchard G, Neuvial P, Roquain E. Post hoc false positive control for structured hypotheses. Scand Stat Theory Appl 2020. [DOI: 10.1111/sjos.12453] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Guillermo Durand
- Laboratoire de probabilités Statistique et Modélisation, LPSM Sorbonne Université France
| | - Gilles Blanchard
- Laboratoire de Mathématiques d'Orsay Université Paris‐Sud, CNRS, Université Paris‐Saclay France
| | - Pierre Neuvial
- Institut de Mathématiques de Toulouse UMR 5219, Université de Toulouse, CNRS, UPS IMT France
| | - Etienne Roquain
- Laboratoire de probabilités Statistique et Modélisation, LPSM Sorbonne Université France
| |
Collapse
|
2
|
Lahouel K, Geman D, Younes L. Coarse-to-fine multiple testing strategies. Electron J Stat 2019. [DOI: 10.1214/19-ejs1536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
3
|
Chang LB, Borenstein E, Zhang W, Geman S. Maximum likelihood features for generative image models. Ann Appl Stat 2017. [DOI: 10.1214/17-aoas1025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
4
|
Tony Cai T, Sun W. Optimal screening and discovery of sparse signals with applications to multistage high throughput studies. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12171] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - Wenguang Sun
- University of Southern California Los Angeles USA
| |
Collapse
|
5
|
Sun W, Wei Z. Hierarchical recognition of sparse patterns in large-scale simultaneous inference. Biometrika 2015. [DOI: 10.1093/biomet/asv012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
|
6
|
Wu T, Zhu SC. Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:1013-1027. [PMID: 26353325 DOI: 10.1109/tpami.2014.2359653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Many popular object detectors, such as AdaBoost, SVM and deformable part-based models (DPM), compute additive scoring functions at a large number of windows in an image pyramid, thus computational efficiency is an important consideration in real time applications besides accuracy. In this paper, a decision policy refers to a sequence of two-sided thresholds to execute early reject and early accept based on the cumulative scores at each step. We formulate an empirical risk function as the weighted sum of the cost of computation and the loss of false alarm and missing detection. Then a policy is said to be cost-sensitive and optimal if it minimizes the risk function. While the risk function is complex due to high-order correlations among the two-sided thresholds, we find that its upper bound can be optimized by dynamic programming efficiently. We show that the upper bound is very tight empirically and thus the resulting policy is said to be near-optimal. In experiments, we show that the decision policy outperforms state-of-the-art cascade methods significantly, in several popular detection tasks and benchmarks, in terms of computational efficiency with similar accuracy of detection.
Collapse
|
7
|
Abstract
In plant and animal breeding studies a distinction is made between the genetic value (additive plus epistatic genetic effects) and the breeding value (additive genetic effects) of an individual since it is expected that some of the epistatic genetic effects will be lost due to recombination. In this article, we argue that the breeder can take advantage of the epistatic marker effects in regions of low recombination. The models introduced here aim to estimate local epistatic line heritability by using genetic map information and combining local additive and epistatic effects. To this end, we have used semiparametric mixed models with multiple local genomic relationship matrices with hierarchical designs. Elastic-net postprocessing was used to introduce sparsity. Our models produce good predictive performance along with useful explanatory information.
Collapse
Affiliation(s)
- Deniz Akdemir
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853
| | - Jean-Luc Jannink
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853
| |
Collapse
|
8
|
Castro RM. Adaptive sensing performance lower bounds for sparse signal detection and support estimation. BERNOULLI 2014. [DOI: 10.3150/13-bej555] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
9
|
Sung J, Kim PJ, Ma S, Funk CC, Magis AT, Wang Y, Hood L, Geman D, Price ND. Multi-study integration of brain cancer transcriptomes reveals organ-level molecular signatures. PLoS Comput Biol 2013; 9:e1003148. [PMID: 23935471 PMCID: PMC3723500 DOI: 10.1371/journal.pcbi.1003148] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 06/05/2013] [Indexed: 12/23/2022] Open
Abstract
We utilized abundant transcriptomic data for the primary classes of brain cancers to study the feasibility of separating all of these diseases simultaneously based on molecular data alone. These signatures were based on a new method reported herein – Identification of Structured Signatures and Classifiers (ISSAC) – that resulted in a brain cancer marker panel of 44 unique genes. Many of these genes have established relevance to the brain cancers examined herein, with others having known roles in cancer biology. Analyses on large-scale data from multiple sources must deal with significant challenges associated with heterogeneity between different published studies, for it was observed that the variation among individual studies often had a larger effect on the transcriptome than did phenotype differences, as is typical. For this reason, we restricted ourselves to studying only cases where we had at least two independent studies performed for each phenotype, and also reprocessed all the raw data from the studies using a unified pre-processing pipeline. We found that learning signatures across multiple datasets greatly enhanced reproducibility and accuracy in predictive performance on truly independent validation sets, even when keeping the size of the training set the same. This was most likely due to the meta-signature encompassing more of the heterogeneity across different sources and conditions, while amplifying signal from the repeated global characteristics of the phenotype. When molecular signatures of brain cancers were constructed from all currently available microarray data, 90% phenotype prediction accuracy, or the accuracy of identifying a particular brain cancer from the background of all phenotypes, was found. Looking forward, we discuss our approach in the context of the eventual development of organ-specific molecular signatures from peripheral fluids such as the blood. From a multi-study, integrated transcriptomic dataset, we identified a marker panel for differentiating major human brain cancers at the gene-expression level. The ISSAC molecular signatures for brain cancers, composed of 44 unique genes, are based on comparing expression levels of pairs of genes, and phenotype prediction follows a diagnostic hierarchy. We found that sufficient dataset integration across multiple studies greatly enhanced diagnostic performance on truly independent validation sets, whereas signatures learned from only one dataset typically led to high error rate. Molecular signatures of brain cancers, when obtained using all currently available gene-expression data, achieved 90% phenotype prediction accuracy. Thus, our integrative approach holds significant promise for developing organ-level, comprehensive, molecular signatures of disease.
Collapse
Affiliation(s)
- Jaeyun Sung
- Institute for Systems Biology, Seattle, Washington, United States of America
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, Illinois, United States of America
| | - Pan-Jun Kim
- Asia Pacific Center for Theoretical Physics, Pohang, Gyeongbuk, Republic of Korea
- Department of Physics, POSTECH, Pohang, Gyeongbuk, Republic of Korea
| | - Shuyi Ma
- Institute for Systems Biology, Seattle, Washington, United States of America
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, Illinois, United States of America
| | - Cory C. Funk
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Andrew T. Magis
- Institute for Systems Biology, Seattle, Washington, United States of America
- Center for Biophysics and Computational Biology, University of Illinois, Urbana, Illinois, United States of America
| | - Yuliang Wang
- Institute for Systems Biology, Seattle, Washington, United States of America
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, Illinois, United States of America
| | - Leroy Hood
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Donald Geman
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Nathan D. Price
- Institute for Systems Biology, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
10
|
Chang LB, Jin Y, Zhang W, Borenstein E, Geman S. Context, Computation, and Optimal ROC Performance in Hierarchical Models. Int J Comput Vis 2010. [DOI: 10.1007/s11263-010-0391-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
11
|
|
12
|
Meinshausen N, Bickel P, Rice J. Efficient blind search: Optimal power of detection under computational cost constraints. Ann Appl Stat 2009. [DOI: 10.1214/08-aoas180] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Wu J, Brubaker SC, Mullin MD, Rehg JM. Fast asymmetric learning for cascade face detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2008; 30:369-382. [PMID: 18195433 DOI: 10.1109/tpami.2007.1181] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
A cascade face detector uses a sequence of node classifiers to distinguish faces from non-faces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. We argue that if these two parts are decoupled, we have the freedom to design a classifier that explicitly addresses the difficulties caused by the asymmetric learning goal. There are three contributions in this paper. The first is a categorization of asymmetries in the learning goal, and why they make face detection hard. The second is the Forward Feature Selection (FFS) algorithm and a fast pre- omputing strategy for AdaBoost. FFS and the fast AdaBoost can reduce the training time by approximately 100 and 50 times, in comparison to a naive implementation of the AdaBoost feature selection method. The last contribution is Linear Asymmetric Classifier (LAC), a classifier that explicitly handles the asymmetric learning goal as a well-defined constrained optimization problem. We demonstrated experimentally that LAC results in improved ensemble classifier performance.
Collapse
Affiliation(s)
- Jianxin Wu
- School of Interactive Computing, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0760, USA.
| | | | | | | |
Collapse
|
14
|
|
15
|
Westover MB, O'Sullivan JA. Achievable Rates for Pattern Recognition. IEEE TRANSACTIONS ON INFORMATION THEORY 2008; 54:299-320. [PMID: 32153303 PMCID: PMC7062371 DOI: 10.1109/tit.2007.911296] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Biological and machine pattern recognition systems face a common challenge: Given sensory data about an unknown pattern, classify the pattern by searching for the best match within a library of representations stored in memory. In many cases, the number of patterns to be discriminated and the richness of the raw data force recognition systems to internally represent memory and sensory information in a compressed format. However, these representations must preserve enough information to accommodate the variability and complexity of the environment, otherwise recognition will be unreliable. Thus, there is an intrinsic tradeoff between the amount of resources devoted to data representation and the complexity of the environment in which a recognition system may reliably operate. In this paper, we describe a mathematical model for pattern recognition systems subject to resource constraints, and show how the aforementioned resource-complexity tradeoff can be characterized in terms of three rates related to the number of bits available for representing memory and sensory data, and the number of patterns populating a given statistical environment. We prove single-letter information-theoretic bounds governing the achievable rates, and investigate in detail two illustrative cases where the pattern data is either binary or Gaussian.
Collapse
Affiliation(s)
- M Brandon Westover
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114-2622 USA
| | - Joseph A O'Sullivan
- Department of Electrical engineering, Washington University, St. Louis, MO 63130 USA
| |
Collapse
|
16
|
|
17
|
|
18
|
Amit Y, Geman D, Fan X. A coarse-to-fine strategy for multiclass shape detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2004; 26:1606-1621. [PMID: 15573821 DOI: 10.1109/tpami.2004.111] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Multiclass shape detection, in the sense of recognizing and localizing instances from multiple shape classes, is formulated as a two-step process in which local indexing primes global interpretation. During indexing a list of instantiations (shape identities and poses) is compiled, constrained only by no missed detections at the expense of false positives. Global information, such as expected relationships among poses, is incorporated afterward to remove ambiguities. This division is motivated by computational efficiency. In addition, indexing itself is organized as a coarse-to-fine search simultaneously in class and pose. This search can be interpreted as successive approximations to likelihood ratio tests arising from a simple ("naive Bayes") statistical model for the edge maps extracted from the original images. The key to constructing efficient "hypothesis tests" for multiple classes and poses is local ORing; in particular, spread edges provide imprecise but common and locally invariant features. Natural tradeoffs then emerge between discrimination and the pattern of spreading. These are analyzed mathematically within the model-based framework and the whole procedure is illustrated by experiments in reading license plates.
Collapse
Affiliation(s)
- Yali Amit
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|