1
|
Cowen AS, Brooks JA, Prasad G, Tanaka M, Kamitani Y, Kirilyuk V, Somandepalli K, Jou B, Schroff F, Adam H, Sauter D, Fang X, Manokara K, Tzirakis P, Oh M, Keltner D. How emotion is experienced and expressed in multiple cultures: a large-scale experiment across North America, Europe, and Japan. Front Psychol 2024; 15:1350631. [PMID: 38966733 PMCID: PMC11223574 DOI: 10.3389/fpsyg.2024.1350631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 03/04/2024] [Indexed: 07/06/2024] Open
Abstract
Core to understanding emotion are subjective experiences and their expression in facial behavior. Past studies have largely focused on six emotions and prototypical facial poses, reflecting limitations in scale and narrow assumptions about the variety of emotions and their patterns of expression. We examine 45,231 facial reactions to 2,185 evocative videos, largely in North America, Europe, and Japan, collecting participants' self-reported experiences in English or Japanese and manual and automated annotations of facial movement. Guided by Semantic Space Theory, we uncover 21 dimensions of emotion in the self-reported experiences of participants in Japan, the United States, and Western Europe, and considerable cross-cultural similarities in experience. Facial expressions predict at least 12 dimensions of experience, despite massive individual differences in experience. We find considerable cross-cultural convergence in the facial actions involved in the expression of emotion, and culture-specific display tendencies-many facial movements differ in intensity in Japan compared to the U.S./Canada and Europe but represent similar experiences. These results quantitatively detail that people in dramatically different cultures experience and express emotion in a high-dimensional, categorical, and similar but complex fashion.
Collapse
Affiliation(s)
- Alan S. Cowen
- Hume AI, New York, NY, United States
- Department of Psychology, University of California, Berkeley, Berkeley, CA, United States
| | - Jeffrey A. Brooks
- Hume AI, New York, NY, United States
- Department of Psychology, University of California, Berkeley, Berkeley, CA, United States
| | | | - Misato Tanaka
- Advanced Telecommunications Research Institute, Kyoto, Japan
- Graduate School of Informatics, Kyoto University, Kyoto, Japan
| | - Yukiyasu Kamitani
- Advanced Telecommunications Research Institute, Kyoto, Japan
- Graduate School of Informatics, Kyoto University, Kyoto, Japan
| | | | - Krishna Somandepalli
- Google Research, Mountain View, CA, United States
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, United States
| | - Brendan Jou
- Google Research, Mountain View, CA, United States
| | | | - Hartwig Adam
- Google Research, Mountain View, CA, United States
| | - Disa Sauter
- Faculty of Social and Behavioural Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Xia Fang
- Zhejiang University, Zhejiang, China
| | - Kunalan Manokara
- Faculty of Social and Behavioural Sciences, University of Amsterdam, Amsterdam, Netherlands
| | | | - Moses Oh
- Hume AI, New York, NY, United States
| | - Dacher Keltner
- Hume AI, New York, NY, United States
- Department of Psychology, University of California, Berkeley, Berkeley, CA, United States
| |
Collapse
|
2
|
Liang W, Zhang Q, Ma S. Hierarchical False Discovery Rate Control for High-dimensional Survival Analysis with Interactions. Comput Stat Data Anal 2024; 192:107906. [PMID: 38098875 PMCID: PMC10718515 DOI: 10.1016/j.csda.2023.107906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
With the development of data collection techniques, analysis with a survival response and high-dimensional covariates has become routine. Here we consider an interaction model, which includes a set of low-dimensional covariates, a set of high-dimensional covariates, and their interactions. This model has been motivated by gene-environment (G-E) interaction analysis, where the E variables have a low dimension, and the G variables have a high dimension. For such a model, there has been extensive research on estimation and variable selection. Comparatively, inference studies with a valid false discovery rate (FDR) control have been very limited. The existing high-dimensional inference tools cannot be directly applied to interaction models, as interactions and main effects are not "equal". In this article, for high-dimensional survival analysis with interactions, we model survival using the Accelerated Failure Time (AFT) model and adopt a "weighted least squares + debiased Lasso" approach for estimation and selection. A hierarchical FDR control approach is developed for inference and respect of the "main effects, interactions" hierarchy. The asymptotic distribution properties of the debiased Lasso estimators are rigorously established. Simulation demonstrates the satisfactory performance of the proposed approach, and the analysis of a breast cancer dataset further establishes its practical utility.
Collapse
Affiliation(s)
- Weijuan Liang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Qingzhao Zhang
- Department of Statistics and Data Science, School of Economics, The Wang Yanan Institute for Studies in Economics, and Fujian Key Lab of Statistics, Xiamen University, Xiamen, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| |
Collapse
|
3
|
Brooks JA, Kim L, Opara M, Keltner D, Fang X, Monroy M, Corona R, Tzirakis P, Baird A, Metrick J, Taddesse N, Zegeye K, Cowen AS. Deep learning reveals what facial expressions mean to people in different cultures. iScience 2024; 27:109175. [PMID: 38433918 PMCID: PMC10906517 DOI: 10.1016/j.isci.2024.109175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 09/05/2023] [Accepted: 02/06/2024] [Indexed: 03/05/2024] Open
Abstract
Cross-cultural studies of the meaning of facial expressions have largely focused on judgments of small sets of stereotypical images by small numbers of people. Here, we used large-scale data collection and machine learning to map what facial expressions convey in six countries. Using a mimicry paradigm, 5,833 participants formed facial expressions found in 4,659 naturalistic images, resulting in 423,193 participant-generated facial expressions. In their own language, participants also rated each expression in terms of 48 emotions and mental states. A deep neural network tasked with predicting the culture-specific meanings people attributed to facial movements while ignoring physical appearance and context discovered 28 distinct dimensions of facial expression, with 21 dimensions showing strong evidence of universality and the remainder showing varying degrees of cultural specificity. These results capture the underlying dimensions of the meanings of facial expressions within and across cultures in unprecedented detail.
Collapse
Affiliation(s)
- Jeffrey A. Brooks
- Research Division, Hume AI, New York, NY 10010, USA
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Lauren Kim
- Research Division, Hume AI, New York, NY 10010, USA
| | | | - Dacher Keltner
- Research Division, Hume AI, New York, NY 10010, USA
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Xia Fang
- Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Maria Monroy
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Rebecca Corona
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94720, USA
| | | | - Alice Baird
- Research Division, Hume AI, New York, NY 10010, USA
| | | | | | | | - Alan S. Cowen
- Research Division, Hume AI, New York, NY 10010, USA
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
4
|
Brooks JA, Tzirakis P, Baird A, Kim L, Opara M, Fang X, Keltner D, Monroy M, Corona R, Metrick J, Cowen AS. Deep learning reveals what vocal bursts express in different cultures. Nat Hum Behav 2023; 7:240-250. [PMID: 36577898 DOI: 10.1038/s41562-022-01489-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 10/26/2022] [Indexed: 12/29/2022]
Abstract
Human social life is rich with sighs, chuckles, shrieks and other emotional vocalizations, called 'vocal bursts'. Nevertheless, the meaning of vocal bursts across cultures is only beginning to be understood. Here, we combined large-scale experimental data collection with deep learning to reveal the shared and culture-specific meanings of vocal bursts. A total of n = 4,031 participants in China, India, South Africa, the USA and Venezuela mimicked vocal bursts drawn from 2,756 seed recordings. Participants also judged the emotional meaning of each vocal burst. A deep neural network tasked with predicting the culture-specific meanings people attributed to vocal bursts while disregarding context and speaker identity discovered 24 acoustic dimensions, or kinds, of vocal expression with distinct emotion-related meanings. The meanings attributed to these complex vocal modulations were 79% preserved across the five countries and three languages. These results reveal the underlying dimensions of human emotional vocalization in remarkable detail.
Collapse
Affiliation(s)
- Jeffrey A Brooks
- Research Division, Hume AI, New York, NY, USA. .,University of California, Berkeley, Berkeley, CA, USA.
| | | | - Alice Baird
- Research Division, Hume AI, New York, NY, USA
| | - Lauren Kim
- Research Division, Hume AI, New York, NY, USA
| | | | - Xia Fang
- Zhejiang University, Hangzhou, China
| | - Dacher Keltner
- Research Division, Hume AI, New York, NY, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Maria Monroy
- University of California, Berkeley, Berkeley, CA, USA
| | | | | | - Alan S Cowen
- Research Division, Hume AI, New York, NY, USA. .,University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
5
|
Xue X, Zong W, Huo Z, Ketchesin KD, Scott MR, Petersen KA, Logan RW, Seney ML, McClung C, Tseng G. DiffCircaPipeline: a framework for multifaceted characterization of differential rhythmicity. Bioinformatics 2023; 39:btad039. [PMID: 36655766 PMCID: PMC9889843 DOI: 10.1093/bioinformatics/btad039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 01/05/2023] [Accepted: 01/17/2023] [Indexed: 01/20/2023] Open
Abstract
SUMMARY Circadian oscillations of gene expression regulate daily physiological processes, and their disruption is linked to many diseases. Circadian rhythms can be disrupted in a variety of ways, including differential phase, amplitude and rhythm fitness. Although many differential circadian biomarker detection methods have been proposed, a workflow for systematic detection of multifaceted differential circadian characteristics with accurate false positive control is not currently available. We propose a comprehensive and interactive pipeline to capture the multifaceted characteristics of differentially rhythmic biomarkers. Analysis outputs are accompanied by informative visualization and interactive exploration. The workflow is demonstrated in multiple case studies and is extensible to general omics applications. AVAILABILITY AND IMPLEMENTATION R package, Shiny app and source code are available in GitHub (https://github.com/DiffCircaPipeline) and Zenodo (https://doi.org/10.5281/zenodo.7507989). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiangning Xue
- Department of Biostatistics, Graduate School of Public Health University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Wei Zong
- Department of Biostatistics, Graduate School of Public Health University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Zhiguang Huo
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| | - Kyle D Ketchesin
- Translational Neuroscience Program, Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15219, USA
| | - Madeline R Scott
- Translational Neuroscience Program, Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15219, USA
| | - Kaitlyn A Petersen
- Translational Neuroscience Program, Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15219, USA
| | - Ryan W Logan
- Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine, Boston, MA 02118, USA
| | - Marianne L Seney
- Translational Neuroscience Program, Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15219, USA
| | - Colleen McClung
- Translational Neuroscience Program, Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15219, USA
| | - George Tseng
- Department of Biostatistics, Graduate School of Public Health University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
6
|
Wang P, Zhu W. Large‐Scale
Covariate Assisted
Two‐Sample
Inference under Dependence. Scand Stat Theory Appl 2022. [DOI: 10.1111/sjos.12608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Pengfei Wang
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics Northeast Normal University
- School of Statistics Dongbei University of Finance and Economics
| | - Wensheng Zhu
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics Northeast Normal University
| |
Collapse
|
7
|
Hoff P. Smaller p-Values via Indirect Information. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2020.1844720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Peter Hoff
- Department of Statistical Science, Duke University, Durham, NC
| |
Collapse
|
8
|
Time series graphical lasso and sparse VAR estimation. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
9
|
Pessoa Colombo V, Chenal J, Koné B, Bosch M, Utzinger J. Using Open-Access Data to Explore Relations between Urban Landscapes and Diarrhoeal Diseases in Côte d’Ivoire. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19137677. [PMID: 35805337 PMCID: PMC9265306 DOI: 10.3390/ijerph19137677] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 06/17/2022] [Accepted: 06/21/2022] [Indexed: 02/01/2023]
Abstract
Unlike water and sanitation infrastructures or socio-economic indicators, landscape features are seldomly considered as predictors of diarrhoea. In contexts of rapid urbanisation and changes in the physical environment, urban planners and public health managers could benefit from a deeper understanding of the relationship between landscape patterns and health outcomes. We conducted an ecological analysis based on a large ensemble of open-access data to identify specific landscape features associated with diarrhoea. Designed as a proof-of-concept study, our research focused on Côte d’Ivoire. This analysis aimed to (i) build a framework strictly based on open-access data and open-source software to investigate diarrhoea risk factors originating from the physical environment and (ii) understand whether different types and forms of urban settlements are associated with different prevalence rates of diarrhoea. We advanced landscape patterns as variables of exposure and tested their association with the prevalence of diarrhoea among children under the age of five years through multiple regression models. A specific urban landscape pattern was significantly associated with diarrhoea. We conclude that, while the improvement of water, sanitation, and hygiene infrastructures is crucial to prevent diarrhoeal diseases, the health benefits of such improvements may be hampered if the overall physical environment remains precarious.
Collapse
Affiliation(s)
- Vitor Pessoa Colombo
- School of Architecture, Civil and Environmental Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland; (J.C.); (M.B.)
- Correspondence:
| | - Jérôme Chenal
- School of Architecture, Civil and Environmental Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland; (J.C.); (M.B.)
| | - Brama Koné
- Centre Suisse de Recherches Scientifiques en Côte d’Ivoire, Abidjan 01 BP 1303, Côte d’Ivoire;
| | - Martí Bosch
- School of Architecture, Civil and Environmental Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland; (J.C.); (M.B.)
| | - Jürg Utzinger
- Swiss Tropical and Public Health Institute, 4123 Allschwil, Switzerland;
- University of Basel, 4001 Basel, Switzerland
| |
Collapse
|
10
|
Best-Arm Identification Using Extreme Value Theory Estimates of the CVaR. JOURNAL OF RISK AND FINANCIAL MANAGEMENT 2022. [DOI: 10.3390/jrfm15040172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We consider a risk-aware multi-armed bandit framework with the goal of avoiding catastrophic risk. Such a framework has multiple applications in financial risk management. We introduce a new conditional value-at-risk (CVaR) estimation procedure combining extreme value theory with automated threshold selection by ordered goodness-of-fit tests, and we apply this procedure to a pure exploration best-arm identification problem under a fixed budget. We empirically compare our results with the commonly used sample average estimator of the CVaR, and we show a significant performance improvement when the underlying arm distributions are heavy-tailed.
Collapse
|
11
|
Cao H, Chen J, Zhang X. Optimal false discovery rate control for large scale multiple testing with auxiliary information. Ann Stat 2022; 50:807-857. [PMID: 37138896 PMCID: PMC10153594 DOI: 10.1214/21-aos2128] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.
Collapse
Affiliation(s)
- Hongyuan Cao
- Department of Statistics, Florida State University
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic
| | | |
Collapse
|
12
|
OUP accepted manuscript. Biostatistics 2022; 23:1039-1055. [DOI: 10.1093/biostatistics/kxac001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 11/12/2021] [Accepted: 12/04/2021] [Indexed: 11/13/2022] Open
|
13
|
Silva RSD, Nascimento FFD, Bourguignon M. Dynamic linear seasonal models applied to extreme temperature data: a Bayesian approach using the r-larger order statistics distribution. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1971668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
14
|
Tredennick AT, Hooker G, Ellner SP, Adler PB. A practical guide to selecting models for exploration, inference, and prediction in ecology. Ecology 2021; 102:e03336. [PMID: 33710619 PMCID: PMC8187274 DOI: 10.1002/ecy.3336] [Citation(s) in RCA: 80] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 10/08/2020] [Accepted: 12/06/2020] [Indexed: 11/12/2022]
Abstract
Selecting among competing statistical models is a core challenge in science. However, the many possible approaches and techniques for model selection, and the conflicting recommendations for their use, can be confusing. We contend that much confusion surrounding statistical model selection results from failing to first clearly specify the purpose of the analysis. We argue that there are three distinct goals for statistical modeling in ecology: data exploration, inference, and prediction. Once the modeling goal is clearly articulated, an appropriate model selection procedure is easier to identify. We review model selection approaches and highlight their strengths and weaknesses relative to each of the three modeling goals. We then present examples of modeling for exploration, inference, and prediction using a time series of butterfly population counts. These show how a model selection approach flows naturally from the modeling goal, leading to different models selected for different purposes, even with exactly the same data set. This review illustrates best practices for ecologists and should serve as a reminder that statistical recipes cannot substitute for critical thinking or for the use of independent data to test hypotheses and validate predictions.
Collapse
Affiliation(s)
- Andrew T Tredennick
- Western EcoSystems Technology, Inc., 1610 East Reynolds Street, Laramie, Wyoming, 82072, USA
| | - Giles Hooker
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, 14853, USA
| | - Stephen P Ellner
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York, 14853, USA
| | - Peter B Adler
- Department of Wildland Resources and the Ecology Center, Utah State University, 5230 Old Main Hill, Logan, Utah, 84322, USA
| |
Collapse
|
15
|
Kormaksson M, Kelly LJ, Zhu X, Haemmerle S, Pricop L, Ohlssen D. Sequential knockoffs for continuous and categorical predictors: With application to a large psoriatic arthritis clinical trial pool. Stat Med 2021; 40:3313-3328. [PMID: 33899260 DOI: 10.1002/sim.8955] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 02/22/2021] [Accepted: 03/01/2021] [Indexed: 01/10/2023]
Abstract
Knockoffs provide a general framework for controlling the false discovery rate when performing variable selection. Much of the Knockoffs literature focuses on theoretical challenges and we recognize a need for bringing some of the current ideas into practice. In this paper we propose a sequential algorithm for generating knockoffs when underlying data consists of both continuous and categorical (factor) variables. Further, we present a heuristic multiple knockoffs approach that offers a practical assessment of how robust the knockoff selection process is for a given dataset. We conduct extensive simulations to validate performance of the proposed methodology. Finally, we demonstrate the utility of the methods on a large clinical data pool of more than 2000 patients with psoriatic arthritis evaluated in four clinical trials with an IL-17A inhibitor, secukinumab (Cosentyx), where we determine prognostic factors of a well established clinical outcome. The analyses presented in this paper could provide a wide range of applications to commonly encountered datasets in medical practice and other fields where variable selection is of particular interest.
Collapse
Affiliation(s)
| | | | - Xuan Zhu
- Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | | | - Luminita Pricop
- Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | - David Ohlssen
- Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| |
Collapse
|
16
|
Zhuang J, Dai S, Zhang L, Gao P, Han Y, Tian G, Yan N, Tang M, Kui L. Identifying Breast Cancer-induced Gene Perturbations and its Application in Guiding Drug Repurposing. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200203104214] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Breast cancer is a complex disease with high prevalence in women, the
molecular mechanisms of which are still unclear at present. Most transcriptomic studies on breast
cancer focus on differential expression of each gene between tumor and the adjacent normal tissues,
while the other perturbations induced by breast cancer including the gene regulation variations, the
changes of gene modules and the pathways, which might be critical to the diagnosis, treatment and
prognosis of breast cancer are more or less ignored.
Objective:
We presented a complete process to study breast cancer from multiple perspectives,
including differential expression analysis, constructing gene co-expression networks, modular
differential connectivity analysis, differential gene connectivity analysis, gene function enrichment
analysis key driver analysis. In addition, we prioritized the related anti-cancer drugs based on
enrichment analysis between differential expression genes and drug perturbation signatures.
Methods:
The RNA expression profiles of 1109 breast cancer tissue and 113 non-tumor tissues were
downloaded from The Cancer Genome Atlas (TCGA) database. Differential expression of RNAs
was identified using the “DESeq2” bioconductor package in R, and gene co-expression networks
were constructed using the weighted gene co-expression network analysis (WGCNA). To compare
the module changes and gene co-expression variations between tumor and the adjacent normal
tissues, modular differential connectivity (MDC) analysis and differential gene connectivity analysis
(DGCA) were performed.
Results:
Top differential genes like MMP11 and COL10A1 were known to be associated with breast
cancer. And we found 23 modules in the tumor network had significantly different co-expression
patterns. The top differential modules were enriched in Goterms related to breast cancer like MHC
protein complex, leukocyte activation, regulation of defense response and so on. In addition, key
genes like UBE2T driving the top differential modules were significantly correlated with the
patients’ survival. Finally, we predicted some potential breast cancer drugs, such as Eribulin,
Taxane, Cisplatin and Oxaliplatin.
Conclusion:
As an indication, this framework might be useful in understanding the molecular
pathogenesis of diseases like breast cancer and inferring useful drugs for personalized medication.
Collapse
Affiliation(s)
- Jujuan Zhuang
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, China
| | - Shuang Dai
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, China
| | - Lijun Zhang
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, China
| | - Pan Gao
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, China
| | - Yingmin Han
- Geneis Beijing Co., Ltd., Beijing, 100102, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing, 100102, China
| | - Na Yan
- Geneis Beijing Co., Ltd., Beijing, 100102, China
| | - Min Tang
- Institute of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Ling Kui
- Dana- Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, United States
| |
Collapse
|
17
|
Katsevich E, Ramdas A. Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings. Ann Stat 2020. [DOI: 10.1214/19-aos1938] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
18
|
Smirnov V, Ma Z, Volchenkov D. Extreme events and emergency scales. COMMUNICATIONS IN NONLINEAR SCIENCE & NUMERICAL SIMULATION 2020; 90:105350. [PMID: 32501383 PMCID: PMC7243033 DOI: 10.1016/j.cnsns.2020.105350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 04/09/2020] [Accepted: 05/19/2020] [Indexed: 06/11/2023]
Abstract
An event is extreme if its magnitude exceeds the threshold. A choice of a threshold is subject to uncertainty caused by a method, the size of available data, a hypothesis on statistics, etc. We assess the degree of uncertainty by the Shannon's entropy calculated on the probability that the threshold changes at any given time. If the amount of data is not sufficient, an observer is in the state of Lewis Carroll's Red Queen who said "When you say hill, I could show you hills, in comparison with which you'd call that a valley". If we have enough data, the uncertainty curve peaks at two values clearly separating the magnitudes of events into three emergency scales: subcritical, critical, and extreme. Our approach to defining the emergency scale is validated by 39 years of Standard and Poor's 500 (S&P500) historical data.
Collapse
Affiliation(s)
- Veniamin Smirnov
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, United States
| | - Zhuanzhuan Ma
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, United States
| | - Dimitri Volchenkov
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, United States
| |
Collapse
|
19
|
Abstract
This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of longitudinal data. To deal with the challenge, we propose a new quadratic decorrelated inference function approach, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. When the parameter of interest is of fixed dimension, we prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal Wald test statistic. We further extend this result and establish the limiting distribution of the estimator under the setting with the dimension of the parameter of interest growing with the sample size at a polynomial rate. Finally, we study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (2002)'s procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. We also apply the proposed procedure to a real data example.
Collapse
Affiliation(s)
- Ethan X Fang
- Department of Statistics, the Pennsylvania State University, University Park, PA 16802-2111, USA
| | - Yang Ning
- Department of Statistics and Data Science, Cornell University, Ithaca, NY 14850, USA
| | - Runze Li
- Department of Statistics, the Pennsylvania State University, University Park, PA 16802-2111, USA
| |
Collapse
|
20
|
Lei L, Ramdas A, Fithian W. A general interactive framework for false discovery rate control under structural constraints. Biometrika 2020. [DOI: 10.1093/biomet/asaa064] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Summary
We propose a general framework based on selectively traversed accumulation rules for interactive multiple testing with generic structural constraints on the rejection set. It combines accumulation tests from ordered multiple testing with data-carving ideas from post-selection inference, allowing highly flexible adaptation to generic structural information. Our procedure defines an interactive protocol for gradually pruning a candidate rejection set, beginning with the set of all hypotheses and shrinking the set with each step. By restricting the information at each step via a technique we call masking, our protocol enables interaction while controlling the false discovery rate in finite samples for any data-adaptive update rule that the analyst may choose. We suggest update rules for a variety of applications with complex structural constraints, demonstrate that selectively traversed accumulation rules perform well in problems ranging from convex region detection to false discovery rate control on directed acyclic graphs, and show how to extend the framework to regression problems where knockoff statistics are available in lieu of $p$-values.
Collapse
Affiliation(s)
- Lihua Lei
- Department of Statistics, Stanford University, 202 Sequoia Hall, 390 Serra Mall, Stanford, California 94305, U.S.A
| | - Aaditya Ramdas
- Department of Statistics and Data Science, Carnegie Mellon University, 132H Baker Hall, Pittsburgh, Pennsylvania 15213, U.S.A
| | - William Fithian
- Department of Statistics, University of California, Berkeley, 301 Evans Hall, Berkeley, California 94720, U.S.A
| |
Collapse
|
21
|
Asadi N, Wang Y, Olson I, Obradovic Z. A heuristic information cluster search approach for precise functional brain mapping. Hum Brain Mapp 2020; 41:2263-2280. [PMID: 32034846 PMCID: PMC7267912 DOI: 10.1002/hbm.24944] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 01/05/2020] [Accepted: 01/08/2020] [Indexed: 12/18/2022] Open
Abstract
Detection of the relevant brain regions for characterizing the distinction between cognitive conditions is one of the most sought after objectives in neuroimaging research. A popular approach for achieving this goal is the multivariate pattern analysis which is currently conducted through a number of approaches such as the popular searchlight procedure. This is due to several advantages such as being automatic and flexible with regards to size of the search region. However, these approaches suffer from a number of limitations which can lead to misidentification of truly informative regions which in turn results in imprecise information maps. These limitations mainly stem from several factors such as the fact that the information value of the search spheres are assigned to the voxel at the center of them (in case of searchlight), the requirement for manual tuning of parameters such as searchlight radius and shape, and high complexity and low interpretability in commonly used machine learning-based approaches. Other drawbacks include overlooking the structure and interactions within the regions, and the disadvantages of using certain regularization techniques in analysis of datasets with characteristics of common functional magnetic resonance imaging data. In this article, we propose a fully data-driven maximum relevance minimum redundancy search algorithm for detecting precise information value of the clusters within brain regions while alleviating the above-mentioned limitations. Moreover, in order to make the proposed method faster, we propose an efficient algorithmic implementation. We evaluate and compare the proposed algorithm with the searchlight procedure as well as least absolute shrinkage and selection operator regularization-based mapping approach using both real and synthetic datasets. The analysis results of the proposed approach demonstrate higher information detection precision and map specificity compared to the benchmark approaches.
Collapse
Affiliation(s)
- Nima Asadi
- Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, Pennsylvania
| | - Yin Wang
- Department of Psychology, College of Liberal Arts, Temple University, Philadelphia, Pennsylvania
| | - Ingrid Olson
- Department of Psychology, College of Liberal Arts, Temple University, Philadelphia, Pennsylvania.,Decision Neuroscience, College of Liberal Arts, Temple University, Philadelphia, Pennsylvania
| | - Zoran Obradovic
- Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, Pennsylvania
| |
Collapse
|
22
|
Chen S, Arias-Castro E. On the power of some sequential multiple testing procedures. ANN I STAT MATH 2020. [DOI: 10.1007/s10463-020-00752-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
23
|
Tardivel PJC, Servien R, Concordet D. Simple expressions of the LASSO and SLOPE estimators in low-dimension. STATISTICS-ABINGDON 2020. [DOI: 10.1080/02331888.2020.1720019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
| | - Rémi Servien
- INTHERES, Université de Toulouse, INRA, ENVT, Toulouse, France
| | | |
Collapse
|
24
|
Cowen AS, Fang X, Sauter D, Keltner D. What music makes us feel: At least 13 dimensions organize subjective experiences associated with music across different cultures. Proc Natl Acad Sci U S A 2020; 117:1924-1934. [PMID: 31907316 PMCID: PMC6995018 DOI: 10.1073/pnas.1910704117] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
What is the nature of the feelings evoked by music? We investigated how people represent the subjective experiences associated with Western and Chinese music and the form in which these representational processes are preserved across different cultural groups. US (n = 1,591) and Chinese (n = 1,258) participants listened to 2,168 music samples and reported on the specific feelings (e.g., "angry," "dreamy") or broad affective features (e.g., valence, arousal) that they made individuals feel. Using large-scale statistical tools, we uncovered 13 distinct types of subjective experience associated with music in both cultures. Specific feelings such as "triumphant" were better preserved across the 2 cultures than levels of valence and arousal, contrasting with theoretical claims that valence and arousal are building blocks of subjective experience. This held true even for music selected on the basis of its valence and arousal levels and for traditional Chinese music. Furthermore, the feelings associated with music were found to occupy continuous gradients, contradicting discrete emotion theories. Our findings, visualized within an interactive map (https://www.ocf.berkeley.edu/∼acowen/music.html) reveal a complex, high-dimensional space of subjective experience associated with music in multiple cultures. These findings can inform inquiries ranging from the etiology of affective disorders to the neurological basis of emotion.
Collapse
Affiliation(s)
- Alan S Cowen
- Department of Psychology, University of California, Berkeley, CA 94720;
| | - Xia Fang
- Department of Psychology, University of Amsterdam, 1001 NK Amsterdam, The Netherlands
- Department of Psychology, York University, Toronto, ON M3J 1P3, Canada
| | - Disa Sauter
- Department of Psychology, University of Amsterdam, 1001 NK Amsterdam, The Netherlands
| | - Dacher Keltner
- Department of Psychology, University of California, Berkeley, CA 94720
| |
Collapse
|
25
|
|
26
|
Breheny PJ. Marginal false discovery rates for penalized regression models. Biostatistics 2019; 20:299-314. [PMID: 29420686 DOI: 10.1093/biostatistics/kxy004] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Accepted: 01/14/2018] [Indexed: 11/14/2022] Open
Abstract
Penalized regression methods are an attractive tool for high-dimensional data analysis, but their widespread adoption has been hampered by the difficulty of applying inferential tools. In particular, the question "How reliable is the selection of those features?" has proved difficult to address. In part, this difficulty arises from defining false discoveries in the classical, fully conditional sense, which is possible in low dimensions but does not scale well to high-dimensional settings. Here, we consider the analysis of marginal false discovery rates (mFDRs) for penalized regression methods. Restricting attention to the mFDR permits straightforward estimation of the number of selections that would likely have occurred by chance alone, and therefore provides a useful summary of selection reliability. Theoretical analysis and simulation studies demonstrate that this approach is quite accurate when the correlation among predictors is mild, and only slightly conservative when the correlation is stronger. Finally, the practical utility of the proposed method and its considerable advantages over other approaches are illustrated using gene expression data from The Cancer Genome Atlas and genome-wide association study data from the Myocardial Applied Genomics Network.
Collapse
|
27
|
Silva RS, Nascimento FF. Extreme Value Theory Applied to r Largest Order Statistics Under the Bayesian Approach. REVISTA COLOMBIANA DE ESTADÍSTICA 2019. [DOI: 10.15446/rce.v42n2.70271] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Extreme Value Theory (EVT) is an important tool to predict efficient gains and losses. Its main areas of analyses are economic and environmental. Initially, for that form of event, it was developed the use of patterns of parametric distribution such as Normal and Gamma. However, economic and environmental data presents, in most cases, a heavy-tailed distribution, in contrast to those distributions. Thus, it was faced a great difficult to frame extreme events. Furthermore, it was almost impossible to use conventional models, making predictions about non-observed events, which exceed the maximum of observations. In some situations EVT is used to analyse only the maximum of some dataset, which provide few observations, and in those cases it is more effective to use the r largest-order statistics. This paper aims to propose Bayesian estimators' for parameters of the r largest-order statistics. During the research, it was used Monte Carlo simulation to analyze the data, and it was observed some properties of those estimators, such as mean, variance, bias and Root Mean Square Error (RMSE). The estimation of the parameters provided inference for its parameters and return levels. This paper also shows a procedure to the choice of the r-optimal to the r largest-order statistics, based on the Bayesian approach applying Markov chains Monte Carlo (MCMC). Simulation results reveal that the Bayesian approach has a similar performance to the Maximum Likelihood Estimation, and the applications were developed using the Bayesian approach and showed a gain in accurary compared with otherestimators.
Collapse
|
28
|
A New Parameter Estimator for the Generalized Pareto Distribution under the Peaks over Threshold Framework. MATHEMATICS 2019. [DOI: 10.3390/math7050406] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Techniques used to analyze exceedances over a high threshold are in great demand for research in economics, environmental science, and other fields. The generalized Pareto distribution (GPD) has been widely used to fit observations exceeding the tail threshold in the peaks over threshold (POT) framework. Parameter estimation and threshold selection are two critical issues for threshold-based GPD inference. In this work, we propose a new GPD-based estimation approach by combining the method of moments and likelihood moment techniques based on the least squares concept, in which the shape and scale parameters of the GPD can be simultaneously estimated. To analyze extreme data, the proposed approach estimates the parameters by minimizing the sum of squared deviations between the theoretical GPD function and its expectation. Additionally, we introduce a recently developed stopping rule to choose the suitable threshold above which the GPD asymptotically fits the exceedances. Simulation studies show that the proposed approach performs better or similar to existing approaches, in terms of bias and the mean square error, in estimating the shape parameter. In addition, the performance of three threshold selection procedures is assessed by estimating the value-at-risk (VaR) of the GPD. Finally, we illustrate the utilization of the proposed method by analyzing air pollution data. In this analysis, we also provide a detailed guide regarding threshold selection.
Collapse
|
29
|
Jeng XJ, Chen X. Predictor ranking and false discovery proportion control in high-dimensional regression. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2018.12.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
30
|
Zhuang J, Zhang L, Dai S, Cui L, Guo C, Sloofman L, Yang J. Comparison of multi-tissue aging between human and mouse. Sci Rep 2019; 9:6220. [PMID: 30996271 PMCID: PMC6470208 DOI: 10.1038/s41598-019-42485-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 03/20/2019] [Indexed: 01/01/2023] Open
Abstract
With the rapid growth of the aging population, exploring the biological basis of aging and related molecular mechanisms has become an important topic in modern scientific research. Aging can cause multiple organ function attenuations, leading to the occurrence and development of various age-related metabolic, nervous system, and cardiovascular diseases. In addition, aging is closely related to the occurrence and development of tumors. Although a number of studies have used various mouse models to study aging, further research is needed to associate mouse and human aging at the molecular level. In this paper, we systematically assessed the relationship between human and mouse aging by comparing multi-tissue age-related gene expression sets. We compared 18 human and mouse tissues, and found 9 significantly correlated tissue pairs. Functional analysis also revealed some terms related to aging in human and mouse. And we performed a crosswise comparison of homologous age-related genes with 18 tissues in human and mouse respectively, and found that human Brain_Cortex was significantly correlated with Brain_Hippocampus, which was also found in mouse. In addition, we focused on comparing four brain-related tissues in human and mouse, and found a gene-GFAP-related to aging in both human and mouse.
Collapse
Affiliation(s)
- Jujuan Zhuang
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, P. R. China
| | - Lijun Zhang
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, P. R. China
| | - Shuang Dai
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, P. R. China
| | - Lingyu Cui
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, P. R. China
| | - Cheng Guo
- Center for Infection and immunity, Columbia University, New York City, New York, USA
| | - Laura Sloofman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Jialiang Yang
- Geneis (Beijing) Co. Ltd, Beijing, 100102, P. R. China.
| |
Collapse
|
31
|
Cowen AS, Laukka P, Elfenbein HA, Liu R, Keltner D. The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures. Nat Hum Behav 2019; 3:369-382. [PMID: 30971794 PMCID: PMC6687085 DOI: 10.1038/s41562-019-0533-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 01/15/2019] [Indexed: 12/30/2022]
Abstract
Central to emotion science is the degree to which categories, such as Awe, or broader affective features, such as Valence, underlie the recognition of emotional expression. To explore the processes by which people recognize emotion from prosody, US and Indian participants were asked to judge the emotion categories or affective features communicated by 2,519 speech samples produced by 100 actors from 5 cultures. With large-scale statistical inference methods, we find that prosody can communicate at least 12 distinct kinds of emotion that are preserved across the 2 cultures. Analyses of the semantic and acoustic structure of the recognition of emotions reveal that emotion categories drive the recognition of emotions more so than affective features, including Valence. In contrast to discrete emotion theories, however, emotion categories are bridged by gradients representing blends of emotions. Our findings, visualized within an interactive map, reveal a complex, high-dimensional space of emotional states recognized cross-culturally in speech prosody.
Collapse
Affiliation(s)
- Alan S Cowen
- Department of Psychology, University of California, Berkeley, Berkeley, CA, USA.
| | - Petri Laukka
- Department of Psychology, Stockholm University, Stockholm, Sweden
| | | | - Runjing Liu
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - Dacher Keltner
- Department of Psychology, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
32
|
Ramdas A, Chen J, Wainwright MJ, Jordan MI. A sequential algorithm for false discovery rate control on directed acyclic graphs. Biometrika 2019. [DOI: 10.1093/biomet/asy066] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Aaditya Ramdas
- Department of Statistics and Data Science, Carnegie Mellon University, 132H Baker Hall, Pittsburgh, Pennsylvania, USA
| | - Jianbo Chen
- Department of Statistics, University of California, 367 Evans Hall, Berkeley, California, USA
| | - Martin J Wainwright
- Department of Statistics, University of California, 367 Evans Hall, Berkeley, California, USA
| | - Michael I Jordan
- Department of Statistics, University of California, 367 Evans Hall, Berkeley, California, USA
| |
Collapse
|
33
|
Miller RE, Breheny P. Marginal false discovery rate control for likelihood-based penalized regression models. Biom J 2019; 61:889-901. [PMID: 30742712 DOI: 10.1002/bimj.201800138] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2018] [Revised: 11/21/2018] [Accepted: 12/23/2018] [Indexed: 11/06/2022]
Abstract
The popularity of penalized regression in high-dimensional data analysis has led to a demand for new inferential tools for these models. False discovery rate control is widely used in high-dimensional hypothesis testing, but has only recently been considered in the context of penalized regression. Almost all of this work, however, has focused on lasso-penalized linear regression. In this paper, we derive a general method for controlling the marginal false discovery rate that can be applied to any penalized likelihood-based model, such as logistic regression and Cox regression. Our approach is fast, flexible and can be used with a variety of penalty functions including lasso, elastic net, MCP, and MNet. We derive theoretical results under which the proposed method is valid, and use simulation studies to demonstrate that the approach is reasonably robust, albeit slightly conservative, when these assumptions are violated. Despite being conservative, we show that our method often offers more power to select causally important features than existing approaches. Finally, the practical utility of the method is demonstrated on gene expression datasets with binary and time-to-event outcomes.
Collapse
Affiliation(s)
- Ryan E Miller
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Patrick Breheny
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
34
|
Gong S, Zhang K, Liu Y. Efficient test-based variable selection for high-dimensional linear models. J MULTIVARIATE ANAL 2019; 166:17-31. [PMID: 30613114 DOI: 10.1016/j.jmva.2018.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression (LARS), among others. These methods typically add variables into the model one by one. For such selection procedures, it is crucial to find a stopping criterion that controls model complexity. One of the most commonly used techniques to this end is cross-validation (CV) which, in spite of its popularity, has two major drawbacks: expensive computational cost and lack of statistical interpretation. To overcome these drawbacks, we introduce a flexible and efficient test-based variable selection approach that can be incorporated into any sequential selection procedure. The test, which is on the overall signal in the remaining inactive variables, is based on the maximal absolute partial correlation between the inactive variables and the response given active variables. We develop the asymptotic null distribution of the proposed test statistic as the dimension tends to infinity uniformly in the sample size. We also show that the test is consistent. With this test, at each step of the selection, a new variable is included if and only if the p-value is below some pre-defined level. Numerical studies show that the proposed method delivers very competitive performance in terms of variable selection accuracy and computational complexity compared to CV.
Collapse
Affiliation(s)
- Siliang Gong
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Kai Zhang
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Yufeng Liu
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.,Depts of Genetics and Biostatistics, Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| |
Collapse
|
35
|
Jeng XJ, Chen X. Variable selection via adaptive false negative control in linear regression. Electron J Stat 2019. [DOI: 10.1214/19-ejs1649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
36
|
Zhao X, Cheng W, Zhang P. Extreme tail risk estimation with the generalized Pareto distribution under the peaks-over-threshold framework. COMMUN STAT-THEOR M 2018. [DOI: 10.1080/03610926.2018.1549253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Xu Zhao
- College of Applied Sciences, Beijing University of Technology, Beijing, China
| | - Weihu Cheng
- College of Applied Sciences, Beijing University of Technology, Beijing, China
| | - Pengyue Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
37
|
Multiple testing with the structure‐adaptive Benjamini–Hochberg algorithm. J R Stat Soc Series B Stat Methodol 2018. [DOI: 10.1111/rssb.12298] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
38
|
Su WJ. When is the first spurious variable selected by sequential regression procedures? Biometrika 2018. [DOI: 10.1093/biomet/asy032] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Weijie J Su
- Department of Statistics, University of Pennsylvania, 472 John M. Huntsman Hall, 3730 Walnut Street, Philadelphia, Pennsylvania 19104, U.S.A
| |
Collapse
|
39
|
Lei L, Fithian W. AdaPT: an interactive procedure for multiple testing with side information. J R Stat Soc Series B Stat Methodol 2018. [DOI: 10.1111/rssb.12274] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Lihua Lei
- University of California; Berkeley USA
| | | |
Collapse
|
40
|
Javanmard A, Montanari A. Online rules for control of false discovery rate and false discovery exceedance. Ann Stat 2018. [DOI: 10.1214/17-aos1559] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
41
|
Khan MHR. On the performance of adaptive preprocessing technique in analyzing high-dimensional censored data. Biom J 2018; 60:687-702. [PMID: 29603360 DOI: 10.1002/bimj.201600256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Revised: 09/05/2017] [Accepted: 10/20/2017] [Indexed: 11/09/2022]
Abstract
Preprocessing for high-dimensional censored datasets, such as the microarray data, is generally considered as an important technique to gain further stability by reducing potential noise from the data. When variable selection including inference is carried out with high-dimensional censored data the objective is to obtain a smaller subset of variables and then perform the inferential analysis using model estimates based on the selected subset of variables. This two stage inferential analysis is prone to circularity bias because of the noise that might still remain in the dataset. In this work, I propose an adaptive preprocessing technique that uses sure independence screening (SIS) idea to accomplish variable selection and reduces the circularity bias by some popularly known refined high-dimensional methods such as the elastic net, adaptive elastic net, weighted elastic net, elastic net-AFT, and two greedy variable selection methods known as TCS, PC-simple all implemented with the accelerated lifetime models. The proposed technique addresses several features including the issue of collinearity between important and some unimportant covariates, which is often the case in high-dimensional setting under variable selection framework, and different level of censoring. Simulation studies along with an empirical analysis with a real microarray data, mantle cell lymphoma, is carried out to demonstrate the performance of the adaptive pre-processing technique.
Collapse
Affiliation(s)
- Md Hasinur Rahaman Khan
- Applied Statistics, Institute of Statistical Research and Training, University of Dhaka, Dhaka, 1000, Bangladesh
| |
Collapse
|
42
|
Bader B, Yan J, Zhang X. Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate. Ann Appl Stat 2018. [DOI: 10.1214/17-aoas1092] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
43
|
Deng L, Zi X, Li Z. False discovery rates for large-scale model checking under certain dependence. COMMUN STAT-THEOR M 2018. [DOI: 10.1080/03610926.2017.1300279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Lu Deng
- Institute of Statistics and LPMC, Nankai University, Tianjin, P. R. China
| | - Xuemin Zi
- School of Science, Tianjin University of Technology and Education, Tianjin, P. R. China
| | - Zhonghua Li
- Institute of Statistics and LPMC, Nankai University, Tianjin, P. R. China
| |
Collapse
|
44
|
Hyun S, G’Sell M, Tibshirani RJ. Exact post-selection inference for the generalized lasso path. Electron J Stat 2018. [DOI: 10.1214/17-ejs1363] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
45
|
Lipkovich I, Dmitrienko A, Muysers C, Ratitch B. Multiplicity issues in exploratory subgroup analysis. J Biopharm Stat 2017; 28:63-81. [DOI: 10.1080/10543406.2017.1397009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
46
|
Wang HJ, McKeague IW, Qian M. Testing for Marginal Linear Effects in Quantile Regression. J R Stat Soc Series B Stat Methodol 2017; 80:433-452. [PMID: 29576736 DOI: 10.1111/rssb.12258] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This paper develops a new marginal testing procedure to detect the presence of significant predictors associated with the conditional quantiles of a scalar response. The idea is to fit the marginal quantile regression on each predictor one at a time, and then base the test on the t-statistics associated with the most predictive predictors. A resampling method is devised to calibrate this test statistic, which has non-regular limiting behavior due to the selection of the most predictive variables. Asymptotic validity of the procedure is established in a general quantile regression setting in which the marginal quantile regression models can be misspecified. Even though a fixed dimension is assumed to derive the asymptotic results, the proposed test is applicable and computationally feasible for large-dimensional predictors. The method is more flexible than existing marginal screening test methods based on mean regression, and has the added advantage of being robust against outliers in the response. The approach is illustrated using an application to an HIV drug resistance dataset.
Collapse
Affiliation(s)
- Huixia Judy Wang
- Associate Professor, Department of Statistics, George Washington University, Washington, District of Columbia 20052, USA
| | - Ian W McKeague
- Professor, Department of Biostatistics, Columbia University, New York, NY 20032, USA
| | - Min Qian
- Assistant Professor, Department of Biostatistics, Columbia University, New York, NY 20032, USA
| |
Collapse
|
47
|
Lenters V, Vermeulen R, Portengen L. Performance of variable selection methods for assessing the health effects of correlated exposures in case-control studies. Occup Environ Med 2017; 75:522-529. [PMID: 28947495 DOI: 10.1136/oemed-2016-104231] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 08/16/2017] [Accepted: 08/22/2017] [Indexed: 11/04/2022]
Abstract
OBJECTIVES There is growing recognition that simultaneously assessing multiple exposures may reduce false positive discoveries and improve epidemiological effect estimates. We evaluated the performance of statistical methods for identifying exposure-outcome associations across various data structures typical of environmental and occupational epidemiology analyses. METHODS We simulated a case-control study, generating 100 data sets for each of 270 different simulation scenarios; varying the number of exposure variables, the correlation between exposures, sample size, the number of effective exposures and the magnitude of effect estimates. We compared conventional analytical approaches, that is, univariable (with and without multiplicity adjustment), multivariable and stepwise logistic regression, with variable selection methods: sparse partial least squares discriminant analysis, boosting, and frequentist and Bayesian penalised regression approaches. RESULTS The variable selection methods consistently yielded more precise effect estimates and generally improved selection accuracy compared with conventional logistic regression methods, especially for scenarios with higher correlation levels. Penalised lasso and elastic net regression both seemed to perform particularly well, specifically when statistical inference based on a balanced weighting of high sensitivity and a low proportion of false discoveries is sought. CONCLUSIONS In this extensive simulation study with multicollinear data, we found that most variable selection methods consistently outperformed conventional approaches, and demonstrated how performance is influenced by the structure of the data and underlying model.
Collapse
Affiliation(s)
- Virissa Lenters
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
| | - Roel Vermeulen
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands.,Departmentof Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Lützen Portengen
- Division of Environmental Epidemiology, Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
48
|
Affiliation(s)
- Ang Li
- Department of Statistics, University of Chicago, Chicago, IL
| | | |
Collapse
|
49
|
Huang H. Controlling the false discoveries in LASSO. Biometrics 2017; 73:1102-1110. [DOI: 10.1111/biom.12665] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 12/01/2016] [Accepted: 01/01/2017] [Indexed: 10/20/2022]
Affiliation(s)
- Hanwen Huang
- Department of Epidemiology and Biostatistics University of Georgia Athens, Georgia 30602
| |
Collapse
|
50
|
Lu J, Deng A. Demystifying the bias from selective inference: A revisit to Dawid’s treatment selection problem. Stat Probab Lett 2016. [DOI: 10.1016/j.spl.2016.06.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|