1
|
Agrawal DK, Smith BJ, Sottile PD, Hripcsak G, Albers DJ. Quantifiable identification of flow-limited ventilator dyssynchrony with the deformed lung ventilator model. Comput Biol Med 2024; 173:108349. [PMID: 38547660 DOI: 10.1016/j.compbiomed.2024.108349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 03/13/2024] [Accepted: 03/17/2024] [Indexed: 04/17/2024]
Abstract
BACKGROUND Ventilator dyssynchrony (VD) can worsen lung injury and is challenging to detect and quantify due to the complex variability in the dyssynchronous breaths. While machine learning (ML) approaches are useful for automating VD detection from the ventilator waveform data, scalable severity quantification and its association with pathogenesis and ventilator mechanics remain challenging. OBJECTIVE We develop a systematic framework to quantify pathophysiological features observed in ventilator waveform signals such that they can be used to create feature-based severity stratification of VD breaths. METHODS A mathematical model was developed to represent the pressure and volume waveforms of individual breaths in a feature-based parametric form. Model estimates of respiratory effort strength were used to assess the severity of flow-limited (FL)-VD breaths compared to normal breaths. A total of 93,007 breath waveforms from 13 patients were analyzed. RESULTS A novel model-defined continuous severity marker was developed and used to estimate breath phenotypes of FL-VD breaths. The phenotypes had a predictive accuracy of over 97% with respect to the previously developed ML-VD identification algorithm. To understand the incidence of FL-VD breaths and their association with the patient state, these phenotypes were further successfully correlated with ventilator-measured parameters and electronic health records. CONCLUSION This work provides a computational pipeline to identify and quantify the severity of FL-VD breaths and paves the way for a large-scale study of VD causes and effects. This approach has direct application to clinical practice and in meaningful knowledge extraction from the ventilator waveform data.
Collapse
Affiliation(s)
- Deepak K Agrawal
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, 400076, India; Department of Bioengineering, University of Colorado Denver | Anschutz Medical Campus, Aurora, CO, 80045, USA.
| | - Bradford J Smith
- Department of Bioengineering, University of Colorado Denver | Anschutz Medical Campus, Aurora, CO, 80045, USA; Section of Pulmonary and Sleep Medicine, Department of Pediatrics, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Peter D Sottile
- Division of Pulmonary Sciences and Critical Care Medicine, Department of Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, 10027, USA
| | - David J Albers
- Department of Bioengineering, University of Colorado Denver | Anschutz Medical Campus, Aurora, CO, 80045, USA; Department of Biomedical Informatics, Columbia University, New York, NY, 10027, USA; Department of Biomedical Informatics, Univerisity of Colorado Anschutz Medical Campus, Aurora, CO 80045.
| |
Collapse
|
2
|
Arahmane H, Dumazert J, Barat E, Dautremer T, Carrel F, Dufour N, Michel M. Statistical approach for radioactivity detection: A brief review. J Environ Radioact 2024; 272:107358. [PMID: 38142518 DOI: 10.1016/j.jenvrad.2023.107358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 12/26/2023]
Abstract
Radioactivity detection is a major research and development priority for many practical applications. Amongst the various technical challenges in this field is the need to carry out accurate low-level radioactivity measurements in the presence of a large fluctuations in the natural radiation background, while reducing the false alarm rates. The task becomes even more harder with high detection limits under low signal-to-background ratios. A detection method based on the statistical inference, following either a frequentist or a Bayesian paradigm, adopted to overcome these challenges as well as to ensure a reliable and accurate diagnosis with a competitive tradeoff between sensitivity, specificity and response time. With this respect, several research studies, addressing a range of applications from decommissioning and dismantling to homeland security, have been proposed. Our main goal in this paper is to present a succinct survey of these studies based on a frequentist and Bayesian approaches used to decision-making, uncertainty and risk evaluation, in the context of radioactive detection. In this prospect, a theoretical background of statistical frequentist and Bayesian inferences was presented. Then, a comparative study of both approaches was performed to determine the optimal approach in regards to accuracy and pros/cons. A case of study for low-level radioactivity detection in nuclear decommissioning operations was provided to validate the optimal approach. Results proved the efficiency and usefulness of Bayesian approach against frequentist one with respect to the most challenging scenarios in radiation detection applications.
Collapse
Affiliation(s)
- Hanan Arahmane
- Université Paris-Saclay, CEA, List, F-91120 Palaiseau, France.
| | | | - Eric Barat
- Université Paris-Saclay, CEA, List, F-91120 Palaiseau, France
| | | | | | | | - Maugan Michel
- Université Paris-Saclay, CEA, List, F-91120 Palaiseau, France
| |
Collapse
|
3
|
Saarela S, Varvia P, Korhonen L, Yang Z, Patterson PL, Gobakken T, Næsset E, Healey SP, Ståhl G. Three-phase hierarchical model-based and hybrid inference. MethodsX 2023; 11:102321. [PMID: 37637291 PMCID: PMC10448159 DOI: 10.1016/j.mex.2023.102321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 08/06/2023] [Indexed: 08/29/2023] Open
Abstract
Global commitments to mitigating climate change and halting biodiversity loss require reliable information about Earth's ecosystems. Increasingly, such information is obtained from multiple sources of remotely sensed data combined with data acquired in the field. This new wealth of data poses challenges regarding the combination of different data sources to derive the required information and assess uncertainties. In this article, we show how predictors and their variances can be derived when hierarchically nested models are applied. Previous studies have developed methods for cases involving two modeling steps, such as biomass prediction relying on tree-level allometric models and models linking plot-level field data with remotely sensed data. This study extends the analysis to cases involving three modeling steps to cover new important applications. The additional step might involve an intermediate model, linking field and remotely sensed data available from a small sample, for making predictions that are subsequently used for training a final prediction model based on remotely sensed data:•In cases where the data in the final step are available wall-to-wall, we denote the approach three-phase hierarchical model-based inference (3pHMB),•In cases where the data in the final step are available as a probability sample, we denote the approach three-phase hierarchical hybrid inference (3pHHY).
Collapse
Affiliation(s)
- Svetlana Saarela
- Faculty of Environmental Sciences and Natural Resource Management, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432, Ås, Norway
| | - Petri Varvia
- School of Forest Sciences, University of Eastern Finland, P.O. Box 111, Joensuu FI-80101, Finland
| | - Lauri Korhonen
- School of Forest Sciences, University of Eastern Finland, P.O. Box 111, Joensuu FI-80101, Finland
| | - Zhiqiang Yang
- USDA Forest Service, Rocky Mountain Research Station, 507 25th St, Ogden, UT, USA
| | - Paul L. Patterson
- USDA Forest Service, Rocky Mountain Research Station, 240 W Prospect, Fort Collins, CO 80526, USA
| | - Terje Gobakken
- Faculty of Environmental Sciences and Natural Resource Management, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432, Ås, Norway
| | - Erik Næsset
- Faculty of Environmental Sciences and Natural Resource Management, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432, Ås, Norway
| | - Sean P. Healey
- USDA Forest Service, Rocky Mountain Research Station, 507 25th St, Ogden, UT, USA
| | - Göran Ståhl
- Faculty of Forest Sciences, Swedish University of Agricultural Sciences, SLU Skogsmarksgränd 17, SE-90183, Umeå, Sweden
| |
Collapse
|
4
|
Peng XR, Bundil I, Schulreich S, Li SC. Neural correlates of valence-dependent belief and value updating during uncertainty reduction: An fNIRS study. Neuroimage 2023; 279:120327. [PMID: 37582418 DOI: 10.1016/j.neuroimage.2023.120327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 08/07/2023] [Accepted: 08/11/2023] [Indexed: 08/17/2023] Open
Abstract
Selective use of new information is crucial for adaptive decision-making. Combining a gamble bidding task with assessing cortical responses using functional near-infrared spectroscopy (fNIRS), we investigated potential effects of information valence on behavioral and neural processes of belief and value updating during uncertainty reduction in young adults. By modeling changes in the participants' expressed subjective values using a Bayesian model, we dissociated processes of (i) updating beliefs about statistical properties of the gamble, (ii) updating values of a gamble based on new information about its winning probabilities, as well as (iii) expectancy violation. The results showed that participants used new information to update their beliefs and values about the gambles in a quasi-optimal manner, as reflected in the selective updating only in situations with reducible uncertainty. Furthermore, their updating was valence-dependent: information indicating an increase in winning probability was underweighted, whereas information about a decrease in winning probability was updated in good agreement with predictions of the Bayesian decision theory. Results of model-based and moderation analyses showed that this valence-dependent asymmetry was associated with a distinct contribution of expectancy violation, besides belief updating, to value updating after experiencing new positive information regarding winning probabilities. In line with the behavioral results, we replicated previous findings showing involvements of frontoparietal brain regions in the different components of updating. Furthermore, this study provided novel results suggesting a valence-dependent recruitment of brain regions. Individuals with stronger oxyhemoglobin responses during value updating was more in line with predictions of the Bayesian model while integrating new information that indicates an increase in winning probability. Taken together, this study provides first results showing expectancy violation as a contributing factor to sub-optimal valence-dependent updating during uncertainty reduction and suggests limitations of normative Bayesian decision theory.
Collapse
Affiliation(s)
- Xue-Rui Peng
- Chair of Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop, Technische Universität Dresden, Dresden, Germany.
| | - Indra Bundil
- Cardiff University Brain Research Imaging Centre, School of Psychology, Cardiff University, Cardiff, United Kingdom
| | - Stefan Schulreich
- Department of Nutritional Sciences, Faculty of Life Sciences, University of Vienna, Vienna, Austria; Department of Cognitive Psychology, Faculty of Psychology and Human Movement Science, Universität Hamburg, Hamburg, Germany
| | - Shu-Chen Li
- Chair of Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop, Technische Universität Dresden, Dresden, Germany.
| |
Collapse
|
5
|
Zhu J, Xie H, Yang Z, Chen J, Yin J, Tian P, Wang H, Zhao J, Zhang H, Lu W, Chen W. Statistical modeling of gut microbiota for personalized health status monitoring. Microbiome 2023; 11:184. [PMID: 37596617 PMCID: PMC10436630 DOI: 10.1186/s40168-023-01614-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 07/06/2023] [Indexed: 08/20/2023]
Abstract
BACKGROUND The gut microbiome is closely associated with health status, and any microbiota dysbiosis could considerably impact the host's health. In addition, many active consortium projects have generated many reference datasets available for large-scale retrospective research. However, a comprehensive monitoring framework that analyzes health status and quantitatively present bacteria-to-health contribution has not been thoroughly investigated. METHODS We systematically developed a statistical monitoring diagram for personalized health status prediction and analysis. Our framework comprises three elements: (1) a statistical monitoring model was established, the health index was constructed, and the health boundary was defined; (2) healthy patterns were identified among healthy people and analyzed using contrast learning; (3) the contribution of each bacterium to the health index of the diseased population was analyzed. Furthermore, we investigated disease proximity using the contribution spectrum and discovered multiple multi-disease-related targets. RESULTS We demonstrated and evaluated the effectiveness of the proposed monitoring framework for tracking personalized health status through comprehensive real-data analysis using the multi-study cohort and another validation cohort. A statistical monitoring model was developed based on 92 microbial taxa. In both the discovery and validation sets, our approach achieved balanced accuracies of 0.7132 and 0.7026, and AUC of 0.80 and 0.76, respectively. Four health patterns were identified in healthy populations, highlighting variations in species composition and metabolic function across these patterns. Furthermore, a reasonable correlation was found between the proposed health index and host physiological indicators, diversity, and functional redundancy. The health index significantly correlated with Shannon diversity ([Formula: see text]) and species richness ([Formula: see text]) in the healthy samples. However, in samples from individuals with diseases, the health index significantly correlated with age ([Formula: see text]), species richness ([Formula: see text]), and functional redundancy ([Formula: see text]). Personalized diagnosis is achieved by analyzing the contribution of each bacterium to the health index. We identified high-contribution species shared across multiple diseases by analyzing the contribution spectrum of these diseases. CONCLUSIONS Our research revealed that the proposed monitoring framework could promote a deep understanding of healthy microbiomes and unhealthy variations and served as a bridge toward individualized therapy target discovery and precise modulation. Video Abstract.
Collapse
Affiliation(s)
- Jinlin Zhu
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Heqiang Xie
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Zixin Yang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Jing Chen
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Jialin Yin
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Peijun Tian
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Hongchao Wang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Jianxin Zhao
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
- (Yangzhou) Institute of Food Biotechnology, Jiangnan University, Yangzhou, Jiangsu, 225004, China
| | - Hao Zhang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China
- (Yangzhou) Institute of Food Biotechnology, Jiangnan University, Yangzhou, Jiangsu, 225004, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi, Jiangsu, 214122, China
- Wuxi Translational Medicine Research Center, Jiangsu Translational Medicine Research Institute Wuxi Branch, Wuxi, Jiangsu, China
| | - Wenwei Lu
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China.
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China.
- (Yangzhou) Institute of Food Biotechnology, Jiangnan University, Yangzhou, Jiangsu, 225004, China.
- International Joint Research Laboratory for Pharmabiotics & Antibiotic Resistance, Jiangnan University, Wuxi, Jiangsu, 214122, China.
| | - Wei Chen
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu, 214122, China.
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, 214122, China.
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi, Jiangsu, 214122, China.
| |
Collapse
|
6
|
PANTEGHINI MARCO, MANSOURNIA MOHAMMADALÌ. The role of statistical significance in health risk assessment and in the decision-making process. J Prev Med Hyg 2023; 64:E121-E122. [PMID: 37654861 PMCID: PMC10468190 DOI: 10.15167/2421-4248/jpmh2023.64.2.2682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 03/17/2023] [Indexed: 09/02/2023]
Affiliation(s)
- MARCO PANTEGHINI
- Department of Obstetrics and Pediatrics, Azienda Unità Sanitaria Locale, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Reggio Emilia, Italy
| | - MOHAMMAD ALÌ MANSOURNIA
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran
| |
Collapse
|
7
|
Kohler D, Staniak M, Tsai TH, Huang T, Shulman N, Bernhardt OM, MacLean BX, Nesvizhskii AI, Reiter L, Sabido E, Choi M, Vitek O. MSstats Version 4.0: Statistical Analyses of Quantitative Mass Spectrometry-Based Proteomic Experiments with Chromatography-Based Quantification at Scale. J Proteome Res 2023; 22:1466-1482. [PMID: 37018319 PMCID: PMC10629259 DOI: 10.1021/acs.jproteome.2c00834] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Indexed: 04/06/2023]
Abstract
The MSstats R-Bioconductor family of packages is widely used for statistical analyses of quantitative bottom-up mass spectrometry-based proteomic experiments to detect differentially abundant proteins. It is applicable to a variety of experimental designs and data acquisition strategies and is compatible with many data processing tools used to identify and quantify spectral features. In the face of ever-increasing complexities of experiments and data processing strategies, the core package of the family, with the same name MSstats, has undergone a series of substantial updates. Its new version MSstats v4.0 improves the usability, versatility, and accuracy of statistical methodology, and the usage of computational resources. New converters integrate the output of upstream processing tools directly with MSstats, requiring less manual work by the user. The package's statistical models have been updated to a more robust workflow. Finally, MSstats' code has been substantially refactored to improve memory use and computation speed. Here we detail these updates, highlighting methodological differences between the new and old versions. An empirical comparison of MSstats v4.0 to its previous implementations, as well as to the packages MSqRob and DEqMS, on controlled mixtures and biological experiments demonstrated a stronger performance and better usability of MSstats v4.0 as compared to existing methods.
Collapse
Affiliation(s)
- Devon Kohler
- Khoury College
of Computer Science, Northeastern University, Boston, Massachusetts 02115, United States
| | | | - Tsung-Heng Tsai
- Khoury College
of Computer Science, Northeastern University, Boston, Massachusetts 02115, United States
| | - Ting Huang
- Khoury College
of Computer Science, Northeastern University, Boston, Massachusetts 02115, United States
| | - Nicholas Shulman
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | | | - Brendan X. MacLean
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Alexey I. Nesvizhskii
- Department
of Pathology and Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, United States
| | | | - Eduard Sabido
- Center for
Genomic Regulation, Barcelona Institute
of Science and Technology, Barcelona 08003, Spain
- Universitat
Pompeu Fabra, Barcelona 08002, Spain
| | - Meena Choi
- Khoury College
of Computer Science, Northeastern University, Boston, Massachusetts 02115, United States
| | - Olga Vitek
- Khoury College
of Computer Science, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
8
|
Frndak S, Yu G, Oulhote Y, Queirolo EI, Barg G, Vahter M, Mañay N, Peregalli F, Olson JR, Ahmed Z, Kordas K. Reducing the complexity of high-dimensional environmental data: An analytical framework using LASSO with considerations of confounding for statistical inference. Int J Hyg Environ Health 2023; 249:114116. [PMID: 36805184 PMCID: PMC10977870 DOI: 10.1016/j.ijheh.2023.114116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 01/10/2023] [Accepted: 01/17/2023] [Indexed: 02/19/2023]
Abstract
PURPOSE Frameworks for selecting exposures in high-dimensional environmental datasets, while considering confounding, are lacking. We present a two-step approach for exposure selection with subsequent confounder adjustment for statistical inference. METHODS We measured cognitive ability in 338 children using the Woodcock-Muñoz General Intellectual Ability (GIA) score, and potential associated features across several environmental domains. Initially, 111 variables theoretically associated with GIA score were introduced into a Least Absolute Shrinkage and Selection Operator (LASSO) in a 50% feature selection subsample. Effect estimates for selected features were subsequently modeled in linear regressions in a 50% inference (hold out) subsample, first adjusting for sex and age and later for covariates selected via directed acyclic graphs (DAGs). All models were adjusted for clustering by school. RESULTS Of the 15 LASSO selected variables, eleven were not associated with GIA score following our inference modeling approach. Four variables were associated with GIA scores, including: serum ferritin adjusted for inflammation (inversely), mother's IQ (positively), father's education (positively), and hours per day the child works on homework (positively). Serum ferritin was not in the expected direction. CONCLUSIONS Our two-step approach moves high-dimensional feature selection a step further by incorporating DAG-based confounder adjustment for statistical inference.
Collapse
Affiliation(s)
- Seth Frndak
- Department of Epidemiology and Environmental Health: University at Buffalo, The State University of New York, USA.
| | - Guan Yu
- Department of Biostatistics: University of Pittsburgh, USA
| | - Youssef Oulhote
- Department of Epidemiology, University of Massachusetts Amherst, USA
| | - Elena I Queirolo
- Department of Neuroscience and Learning, Catholic University of Uruguay, Montevideo, Uruguay
| | - Gabriel Barg
- Department of Neuroscience and Learning, Catholic University of Uruguay, Montevideo, Uruguay
| | - Marie Vahter
- Department of Environmental Medicine: Karolinska Institute, Sweden
| | - Nelly Mañay
- Faculty of Chemistry, University of the Republic of Uruguay (UDELAR), Montevideo, Uruguay
| | - Fabiana Peregalli
- Department of Neuroscience and Learning, Catholic University of Uruguay, Montevideo, Uruguay
| | - James R Olson
- Department of Epidemiology and Environmental Health: University at Buffalo, The State University of New York, USA
| | - Zia Ahmed
- Research and Education in eNergy, Environment and Water (RENEW) Institute University at Buffalo, The State University of New York, USA
| | - Katarzyna Kordas
- Department of Epidemiology and Environmental Health: University at Buffalo, The State University of New York, USA
| |
Collapse
|
9
|
Constable PA, Loh L, Prem-Senthil M, Marmolejo-Ramos F. Visual search and childhood vision impairment: A GAMLSS-oriented multiverse analysis approach. Atten Percept Psychophys 2023. [PMID: 36823260 DOI: 10.3758/s13414-023-02670-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2023] [Indexed: 02/25/2023]
Abstract
The aim of this report was to analyze reaction times and accuracy in children with a vision impairment performing a feature-based visual search task using a multiverse statistical approach. The search task consisted of set sizes 4, 16, and 24, consisting of distractors (circle) and a target (ellipse) that were presented randomly to school-aged individuals with or without a vision impairment. Interactions and main effects of key variables relating to reaction times and accuracy were analyzed via a novel statistical method blending GAMLSS (generalized additive models for location, scale, and shape) and distributional regression trees. Reaction times for the target-present and target-absent conditions were significantly slower in the vision impairment group with increasing set sizes (p < .001). Female participants were significantly slower than were males for set sizes 16 and 24 in the target-absent condition (p < .001), with male participants being significantly slower than females in the target-present condition (p < .001). Accuracy was only significantly worse (p = .03) for participants less than 14 years of age for the target-absent condition with set sizes 16 and 24. There was a positive association between binocular visual acuity and search time (p < .001). The application of GAMLSS with distributional regression trees to the analysis of visual search data may provide further insights into underlying factors affecting search performance in case-control studies where psychological or physical differences may influence visual search outcomes.
Collapse
|
10
|
Miyamoto H, Kikuchi J. An evaluation of homeostatic plasticity for ecosystems using an analytical data science approach. Comput Struct Biotechnol J 2023; 21:869-878. [PMID: 36698969 PMCID: PMC9860287 DOI: 10.1016/j.csbj.2023.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 01/02/2023] [Accepted: 01/03/2023] [Indexed: 01/05/2023] Open
Abstract
The natural world is constantly changing, and planetary boundaries are issuing severe warnings about biodiversity and cycles of carbon, nitrogen, and phosphorus. In other views, social problems such as global warming and food shortages are spreading to various fields. These seemingly unrelated issues are closely related, but it can be said that understanding them in an integrated manner is still a step away. However, progress in analytical technologies has been recognized in various fields and, from a microscopic perspective, with the development of instruments including next-generation sequencers (NGS), nuclear magnetic resonance (NMR), gas chromatography-mass spectrometry (GC/MS), and liquid chromatography-mass spectrometry (LC/MS), various forms of molecular information such as genome data, microflora structure, metabolome, proteome, and lipidome can be obtained. The development of new technology has made it possible to obtain molecular information in a variety of forms. From a macroscopic perspective, the development of environmental analytical instruments and environmental measurement facilities such as satellites, drones, observation ships, and semiconductor censors has increased the data availability for various environmental factors. Based on these background, the role of computational science is to provide a mechanism for integrating and understanding these seemingly disparate data sets. This review describes machine learning and the need for structural equations and statistical causal inference of these data to solve these problems. In addition to introducing actual examples of how these technologies can be utilized, we will discuss how to use these technologies to implement environmentally friendly technologies in society.
Collapse
Affiliation(s)
- Hirokuni Miyamoto
- Graduate School of Horticulture, Chiba University, Matsudo, Chiba 271-8501, Japan
- RIKEN Center for Integrative Medical Science, Yokohama, Kanagawa 230-0045, Japan
- Sermas Co., Ltd., Ichikawa, Chiba 272-0033, Japan
- Japan Eco-science (Nikkan Kagaku) Co. Ltd., Chiba, Chiba 260-0034, Japan
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi, Yokohama 230-0045, Japan
| | - Jun Kikuchi
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi, Yokohama 230-0045, Japan
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa 230-0045, Japan
- Graduate School of Bioagricultural Sciences, Nagoya University, Chikusa, Nagoya 464-8601, Japan
| |
Collapse
|
11
|
Wanduku D. The multilevel hierarchical data EM-algorithm. Applications to discrete-time Markov chain epidemic models. Heliyon 2022; 8:e12622. [PMID: 36643325 PMCID: PMC9834773 DOI: 10.1016/j.heliyon.2022.e12622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 06/21/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open
Abstract
The theory of multilevel hierarchical data Expectation Maximization (EM)-algorithm is introduced via discrete time Markov chain (DTMC) epidemic models. A general model for a multilevel hierarchical discrete data is derived. The observed sample Y in the system is a stochastic incomplete data, and the missing data Z exhibits a multilevel hierarchical data structure. The EM-algorithm to find ML-estimates for parameters in the stochastic system is derived. Applications of the EM-algorithm are exhibited in the two DTMC models, to find ML-estimates of the system parameters. Numerical results are given for influenza epidemics in the state of Georgia (GA), USA.
Collapse
|
12
|
Greenland S, Mansournia MA, Joffe M. To curb research misreporting, replace significance and confidence by compatibility: A Preventive Medicine Golden Jubilee article. Prev Med 2022; 164:107127. [PMID: 35787846 DOI: 10.1016/j.ypmed.2022.107127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 06/27/2022] [Indexed: 11/30/2022]
Abstract
It is well known that the statistical analyses in health-science and medical journals are frequently misleading or even wrong. Despite many decades of reform efforts by hundreds of scientists and statisticians, attempts to fix the problem by avoiding obvious error and encouraging good practice have not altered this basic situation. Statistical teaching and reporting remain mired in damaging yet editorially enforced jargon of "significance", "confidence", and imbalanced focus on null (no-effect or "nil") hypotheses, leading to flawed attempts to simplify descriptions of results in ordinary terms. A positive development amidst all this has been the introduction of interval estimates alongside or in place of significance tests and P-values, but intervals have been beset by similar misinterpretations. Attempts to remedy this situation by calling for replacement of traditional statistics with competitors (such as pure-likelihood or Bayesian methods) have had little impact. Thus, rather than ban or replace P-values or confidence intervals, we propose to replace traditional jargon with more accurate and modest ordinary-language labels that describe these statistics as measures of compatibility between data and hypotheses or models, which have long been in use in the statistical modeling literature. Such descriptions emphasize the full range of possibilities compatible with observations. Additionally, a simple transform of the P-value called the surprisal or S-value provides a sense of how much or how little information the data supply against those possibilities. We illustrate these reforms using some examples from a highly charged topic: trials of ivermectin treatment for Covid-19.
Collapse
Affiliation(s)
- Sander Greenland
- Department of Epidemiology, Department of Statistics, University of California, Los Angeles, USA
| | - Mohammad Ali Mansournia
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran.
| | - Michael Joffe
- Department of Epidemiology & Biostatistics, Imperial College London, United Kingdom
| |
Collapse
|
13
|
Conradt T. Choosing multiple linear regressions for weather-based crop yield prediction with ABSOLUT v1.2 applied to the districts of Germany. Int J Biometeorol 2022; 66:2287-2300. [PMID: 36056956 PMCID: PMC9440329 DOI: 10.1007/s00484-022-02356-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 08/17/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]
Abstract
ABSOLUT v1.2 is an adaptive algorithm that uses correlations between time-aggregated weather variables and crop yields for yield prediction. In contrast to conventional regression-based yield prediction methods, a very broad range of possible input features and their combinations are exhaustively tested for maximum explanatory power. Weather variables such as temperature, precipitation, and sunshine duration are aggregated over different seasonal time periods preceding the harvest to 45 potential input features per original variable. In a first step, this large set of features is reduced to those aggregates very probably holding explanatory power for observed yields. The second, computationally demanding step evaluates predictions for all districts with all of their possible combinations. Step three selects those combinations of weather features that showed the highest predictive power across districts. Finally, the district-specific best performing regressions among these are used for actual prediction, and the results are spatially aggregated. To evaluate the new approach, ABSOLUT v1.2 is applied to predict the yields of silage maize, winter wheat, and other major crops in Germany based on two decades of data from about 300 districts. It turned out to be absolutely crucial to not only make out-of-sample predictions (solely based on data excluding the target year to predict) but to also consequently separate training and testing years in the process of feature selection. Otherwise, the prediction accuracy would be over-estimated by far. The question arises whether performances claimed for other statistical modelling examples are often upward-biased through input variable selection disregarding the out-of-sample principle.
Collapse
Affiliation(s)
- Tobias Conradt
- Potsdam Institute for Climate Impact Research, Telegrafenberg A31, 14473, Potsdam, Germany.
| |
Collapse
|
14
|
McNulty R. A logical analysis of null hypothesis significance testing using popular terminology. BMC Med Res Methodol 2022; 22:244. [PMID: 36123631 DOI: 10.1186/s12874-022-01696-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 07/20/2022] [Indexed: 11/16/2022] Open
Abstract
Background Null Hypothesis Significance Testing (NHST) has been well criticised over the years yet remains a pillar of statistical inference. Although NHST is well described in terms of statistical models, most textbooks for non-statisticians present the null and alternative hypotheses (H0 and HA, respectively) in terms of differences between groups such as (μ1 = μ2) and (μ1 ≠ μ2) and HA is often stated to be the research hypothesis. Here we use propositional calculus to analyse the internal logic of NHST when couched in this popular terminology. The testable H0 is determined by analysing the scope and limits of the P-value and the test statistic’s probability distribution curve. Results We propose a minimum axiom set NHST in which it is taken as axiomatic that H0 is rejected if P-value< α. Using the common scenario of the comparison of the means of two sample groups as an example, the testable H0 is {(μ1 = μ2) and [(\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{x}$$\end{document}x¯1 ≠ \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{x}$$\end{document}x¯2) due to chance alone]}. The H0 and HA pair should be exhaustive to avoid false dichotomies. This entails that HA is ¬{(μ1 = μ2) and [(\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{x}$$\end{document}x¯1 ≠ \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{x}$$\end{document}x¯2) due to chance alone]}, rather than the research hypothesis (HT). To see the relationship between HA and HT, HA can be rewritten as the disjunction HA: ({(μ1 = μ2) ∧ [(\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{x}$$\end{document}x¯1 ≠ \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{x}$$\end{document}x¯2) not due to chance alone]} ∨ {(μ1 ≠ μ2) ∧ [\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$(\overline{x}$$\end{document}(x¯1 ≠ \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{x}$$\end{document}x¯2) not due to (μ1 ≠ μ2) alone]} ∨ {(μ1 ≠ μ2) ∧ [(\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{\boldsymbol{x}}$$\end{document}x¯1≠\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\overline{\boldsymbol{x}}$$\end{document}x¯2) due to (μ1 ≠ μ2) alone]}). This reveals that HT (the last disjunct in bold) is just one possibility within HA. It is only by adding premises to NHST that HT or other conclusions can be reached. Conclusions Using this popular terminology for NHST, analysis shows that the definitions of H0 and HA differ from those found in textbooks. In this framework, achieving a statistically significant result only justifies the broad conclusion that the results are not due to chance alone, not that the research hypothesis is true. More transparency is needed concerning the premises added to NHST to rig particular conclusions such as HT. There are also ramifications for the interpretation of Type I and II errors, as well as power, which do not specifically refer to HT as claimed by texts.
Collapse
|
15
|
Shams L, Beierholm U. Bayesian causal inference: A unifying neuroscience theory. Neurosci Biobehav Rev 2022; 137:104619. [PMID: 35331819 DOI: 10.1016/j.neubiorev.2022.104619] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 02/21/2022] [Accepted: 03/10/2022] [Indexed: 01/08/2023]
Abstract
Understanding of the brain and the principles governing neural processing requires theories that are parsimonious, can account for a diverse set of phenomena, and can make testable predictions. Here, we review the theory of Bayesian causal inference, which has been tested, refined, and extended in a variety of tasks in humans and other primates by several research groups. Bayesian causal inference is normative and has explained human behavior in a vast number of tasks including unisensory and multisensory perceptual tasks, sensorimotor, and motor tasks, and has accounted for counter-intuitive findings. The theory has made novel predictions that have been tested and confirmed empirically, and recent studies have started to map its algorithms and neural implementation in the human brain. The parsimony, the diversity of the phenomena that the theory has explained, and its illuminating brain function at all three of Marr's levels of analysis make Bayesian causal inference a strong neuroscience theory. This also highlights the importance of collaborative and multi-disciplinary research for the development of new theories in neuroscience.
Collapse
Affiliation(s)
- Ladan Shams
- Departments of Psychology, BioEngineering, and Neuroscience Interdepartmental Program, University of California, Los Angeles, USA.
| | | |
Collapse
|
16
|
Abstract
We construct a family of genealogy-valued Markov processes that are induced by a continuous-time Markov population process. We derive exact expressions for the likelihood of a given genealogy conditional on the history of the underlying population process. These lead to a nonlinear filtering equation which can be used to design efficient Monte Carlo inference algorithms. We demonstrate these calculations with several examples. Existing full-information approaches for phylodynamic inference are special cases of the theory.
Collapse
Affiliation(s)
- AARON A. KING
- Department of Ecology & Evolutionary Biology, Center for the Study of Complex Systems, Center for Computational Medicine & Biology, and Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI 48109 USA
| | - QIANYING LIN
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI 48109 USA
| | - EDWARD L. IONIDES
- Department of Statistics and Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI 48109 USA
| |
Collapse
|
17
|
Lytsy P, Hartman M, Pingel R. Misinterpretations of P-values and statistical tests persists among researchers and professionals working with statistics and epidemiology. Ups J Med Sci 2022; 127:8760. [PMID: 35991465 PMCID: PMC9383044 DOI: 10.48101/ujms.v127.8760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The aim was to investigate inferences of statistically significant test results among persons with more or less statistical education and research experience. METHODS A total of 75 doctoral students and 64 statisticians/epidemiologist responded to a web questionnaire about inferences of statistically significant findings. Participants were asked about their education and research experience, and also whether a 'statistically significant' test result (P = 0.024, α-level 0.05) could be inferred as proof or probability statements about the truth or falsehood of the null hypothesis (H0) and the alternative hypothesis (H1). RESULTS Almost all participants reported having a university degree, and among statisticians/epidemiologist, most reported having a university degree in statistics and were working professionally with statistics. Overall, 9.4% of statisticians/epidemiologist and 24.0% of doctoral students responded that the statistically significant finding proved that H0 is not true, and 73.4% of statisticians/epidemiologists and 53.3% of doctoral students responded that the statistically significant finding indicated that H0 is improbable. Corresponding numbers about inferences about the alternative hypothesis (H1) were 12.0% and 6.2% about proving H1 being true and 62.7 and 62.5% for the conclusion that H1 is probable. Correct inferences to both questions, which is that a statistically significant finding cannot be inferred as either proof or a measure of a hypothesis' probability, were given by 10.7% of doctoral students and 12.5% of statisticians/epidemiologists. CONCLUSIONS Misinterpretation of P-values and statistically significant test results persists also among persons who have substantial statistical education and who work professionally with statistics.
Collapse
Affiliation(s)
- Per Lytsy
- Department of Public Health and Caring Sciences, University of Uppsala, Uppsala, Sweden
| | | | - Ronnie Pingel
- Department of Public Health and Caring Sciences, University of Uppsala, Uppsala, Sweden
- Department of Statistics, University of Uppsala, Uppsala, Sweden
| |
Collapse
|
18
|
Abstract
Technological breakthroughs concerning both sensors and robotized plant phenotyping platforms have totally renewed the plant phenotyping paradigm in the last two decades. This has impacted both the nature and the throughput of data with the availability of data at high-throughput from the tissular to the whole plant scale. Sensor outputs often take the form of 2D or 3D images or time series of such images from which traits are extracted while organ shapes, shoot or root system architectures can be deduced. Despite this change of paradigm, many phenotyping studies often ignore the structure of the plant and therefore loose the information conveyed by the temporal and spatial patterns emerging from this structure. The developmental patterns of plants often take the form of succession of well-differentiated phases, stages or zones depending on the temporal, spatial or topological indexing of data. This entails the use of hierarchical statistical models for their identification.The objective here is to show potential approaches for analyzing structured plant phenotyping data using state-of-the-art methods combining probabilistic modeling, statistical inference and pattern recognition. This approach is illustrated using five different examples at various scales that combine temporal and topological index parameters, and development and growth variables obtained using prospective or retrospective measurements.
Collapse
Affiliation(s)
- Yann Guédon
- AGAP, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Yves Caraglio
- AMAP, Univ Montpellier, CIRAD, CNRS, INRAE, IRD, Montpellier, France.
| | - Christine Granier
- AGAP, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Pierre-Éric Lauri
- ABSys, Univ Montpellier, CIHEAM-IAMM, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Bertrand Muller
- LEPSE, Univ Montpellier, INRAE, Institut Agro, Montpellier, France
| |
Collapse
|
19
|
Muntoni AP, Pagnani A, Weigt M, Zamponi F. adabmDCA: adaptive Boltzmann machine learning for biological sequences. BMC Bioinformatics 2021; 22:528. [PMID: 34715775 DOI: 10.1186/s12859-021-04441-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 10/12/2021] [Indexed: 11/30/2022] Open
Abstract
Background Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences. Results Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain. Conclusions The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.
Collapse
|
20
|
Abstract
Although null hypothesis testing (NHT) is the primary method for analyzing data in many natural sciences, it has been increasingly criticized. Recently, approaches based on information theory (IT) have become popular and were held by many to be superior because it enables researchers to properly assess the strength of the evidence that data provide for competing hypotheses. Many studies have compared IT and NHT in the context of model selection and stepwise regression, but a systematic comparison of the most basic uses of statistics by ecologists is still lacking. We used computer simulations to compare how both approaches perform in four basic test designs (t-test, ANOVA, correlation tests, and multiple linear regression). Performance was measured by the proportion of simulated samples for which each method provided the correct conclusion (power), the proportion of detected effects with a wrong sign (S-error), and the mean ratio of the estimated effect to the true effect (M-error). We also checked if the p-value from significance tests correlated to a measure of strength of evidence, the Akaike weight. In general both methods performed equally well. The concordance is explained by the monotonic relationship between p-values and evidence weights in simple designs, which agree with analytic results. Our results show that researchers can agree on the conclusions drawn from a data set even when they are using different statistical approaches. By focusing on the practical consequences of inferences, such a pragmatic view of statistics can promote insightful dialogue among researchers on how to find a common ground from different pieces of evidence. A less dogmatic view of statistical inference can also help to broaden the debate about the role of statistics in science to the entire path that leads from a research hypothesis to a statistical hypothesis.
Collapse
Affiliation(s)
| | - Paulo Inácio Prado
- Departamento de Ecologia/Instituto de Biociências, Universidade de São Paulo, São Paulo, São Paulo, Brazil
| |
Collapse
|
21
|
Anderson RB, Crawford JC, Bailey MH. Biasing the input: A yoked-scientist demonstration of the distorting effects of optional stopping on Bayesian inference. Behav Res Methods 2021. [PMID: 34494220 DOI: 10.3758/s13428-021-01618-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/05/2021] [Indexed: 11/08/2022]
Abstract
Prior work by Michael R. Dougherty and colleagues (Yu et al., 2014) shows that when a scientist monitors the p value during data collection and uses a critical p as the signal to stop collecting data, the resulting p is distorted due to Type I error-rate inflation. They argued similarly that the use of a critical Bayes factor (BF(crit)) for stopping distorts the obtained Bayes factor (BF), a position that has met with controversy. The present paper clarified that when BF(crit) is used as a stopping criterion, the sample becomes biased in that data consistent with large effects have a greater chance to be included than do other data, thus biasing the input to Bayesian inference. We report simulations of yoked pairs of scientists in which Scientist A uses BF(crit) to optionally stop, while Scientist B, sampling from the same population, stops when A stops. Thus, optional stopping is compared not to a hypothetical in which no stopping occurs, but to a situation in which B stops for reasons unrelated to the characteristics of B's sample. The results indicated that optional stopping biased the input for Bayesian inference. We also simulated the use of effect-size stabilization as a stopping criterion and found no bias in that case.
Collapse
|
22
|
Zhao S, Tang B, Musa SS, Ma S, Zhang J, Zeng M, Yun Q, Guo W, Zheng Y, Yang Z, Peng Z, Chong MK, Javanbakht M, He D, Wang MH. Estimating the generation interval and inferring the latent period of COVID-19 from the contact tracing data. Epidemics 2021; 36:100482. [PMID: 34175549 PMCID: PMC8223005 DOI: 10.1016/j.epidem.2021.100482] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 06/13/2021] [Accepted: 06/16/2021] [Indexed: 12/31/2022] Open
Abstract
The coronavirus disease 2019 (COVID-19) emerged by end of 2019, and became a serious public health threat globally in less than half a year. The generation interval and latent period, though both are of importance in understanding the features of COVID-19 transmission, are difficult to observe, and thus they can rarely be learnt from surveillance data empirically. In this study, we develop a likelihood framework to estimate the generation interval and incubation period simultaneously by using the contact tracing data of COVID-19 cases, and infer the pre-symptomatic transmission proportion and latent period thereafter. We estimate the mean of incubation period at 6.8 days (95 %CI: 6.2, 7.5) and SD at 4.1 days (95 %CI: 3.7, 4.8), and the mean of generation interval at 6.7 days (95 %CI: 5.4, 7.6) and SD at 1.8 days (95 %CI: 0.3, 3.8). The basic reproduction number is estimated ranging from 1.9 to 3.6, and there are 49.8 % (95 %CI: 33.3, 71.5) of the secondary COVID-19 infections likely due to pre-symptomatic transmission. Using the best estimates of model parameters, we further infer the mean latent period at 3.3 days (95 %CI: 0.2, 7.9). Our findings highlight the importance of both isolation for symptomatic cases, and for the pre-symptomatic and asymptomatic cases.
Collapse
Affiliation(s)
- Shi Zhao
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China; CUHK Shenzhen Research Institute, Shenzhen, China.
| | - Biao Tang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China; Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, Toronto, ON, M3J 1P3, Canada.
| | - Salihu S Musa
- Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong, China; Department of Mathematics, Kano University of Science and Technology, Wudil, Nigeria.
| | - Shujuan Ma
- Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha, China.
| | - Jiayue Zhang
- Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha, China.
| | - Minyan Zeng
- Department of Neurology, Peking University Shenzhen Hospital, Shenzhen, China.
| | - Qingping Yun
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China.
| | - Wei Guo
- Department of Neurology, Peking University Shenzhen Hospital, Shenzhen, China.
| | - Yixiang Zheng
- Department of Infectious Diseases, Key Laboratory of Viral Hepatitis of Hunan, Xiangya Hospital, Central South University, Changsha, China.
| | - Zuyao Yang
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China.
| | - Zhihang Peng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.
| | - Marc Kc Chong
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China; CUHK Shenzhen Research Institute, Shenzhen, China.
| | - Mohammad Javanbakht
- Nephrology and Urology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| | - Daihai He
- Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong, China.
| | - Maggie H Wang
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China; CUHK Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
23
|
Abstract
Recently, optional stopping has been a subject of debate in the Bayesian psychology community. Rouder (Psychonomic Bulletin & Review 21(2), 301-308, 2014) argues that optional stopping is no problem for Bayesians, and even recommends the use of optional stopping in practice, as do (Wagenmakers, Wetzels, Borsboom, van der Maas & Kievit, Perspectives on Psychological Science 7, 627-633, 2012). This article addresses the question of whether optional stopping is problematic for Bayesian methods, and specifies under which circumstances and in which sense it is and is not. By slightly varying and extending Rouder's (Psychonomic Bulletin & Review 21(2), 301-308, 2014) experiments, we illustrate that, as soon as the parameters of interest are equipped with default or pragmatic priors-which means, in most practical applications of Bayes factor hypothesis testing-resilience to optional stopping can break down. We distinguish between three types of default priors, each having their own specific issues with optional stopping, ranging from no-problem-at-all (type 0 priors) to quite severe (type II priors).
Collapse
Affiliation(s)
- Rianne de Heide
- Leiden University, Leiden, Amsterdam, The Netherlands
- The Netherlands Centre for Mathematics & Computer Science (CWI), Amsterdam, The Netherlands
| | - Peter D Grünwald
- Leiden University, Leiden, Amsterdam, The Netherlands.
- The Netherlands Centre for Mathematics & Computer Science (CWI), Amsterdam, The Netherlands.
| |
Collapse
|
24
|
Dyrka W, Gąsior-Głogowska M, Szefczyk M, Szulc N. Searching for universal model of amyloid signaling motifs using probabilistic context-free grammars. BMC Bioinformatics 2021; 22:222. [PMID: 33926372 PMCID: PMC8086366 DOI: 10.1186/s12859-021-04139-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 04/19/2021] [Indexed: 11/16/2022] Open
Abstract
Background Amyloid signaling motifs are a class of protein motifs which share basic structural and functional features despite the lack of clear sequence homology. They are hard to detect in large sequence databases either with the alignment-based profile methods (due to short length and diversity) or with generic amyloid- and prion-finding tools (due to insufficient discriminative power). We propose to address the challenge with a machine learning grammatical model capable of generalizing over diverse collections of unaligned yet related motifs. Results First, we introduce and test improvements to our probabilistic context-free grammar framework for protein sequences that allow for inferring more sophisticated models achieving high sensitivity at low false positive rates. Then, we infer universal grammars for a collection of recently identified bacterial amyloid signaling motifs and demonstrate that the method is capable of generalizing by successfully searching for related motifs in fungi. The results are compared to available alternative methods. Finally, we conduct spectroscopy and staining analyses of selected peptides to verify their structural and functional relationship. Conclusions While the profile HMMs remain the method of choice for modeling homologous sets of sequences, PCFGs seem more suitable for building meta-family descriptors and extrapolating beyond the seed sample. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04139-y.
Collapse
Affiliation(s)
- Witold Dyrka
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland.
| | - Marlena Gąsior-Głogowska
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland
| | - Monika Szefczyk
- Wydział Chemiczny, Katedra Chemii Bioorganicznej, Politechnika Wrocławska, Wrocław, Poland
| | - Natalia Szulc
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland
| |
Collapse
|
25
|
Perneger TV. How to use likelihood ratios to interpret evidence from randomized trials. J Clin Epidemiol 2021; 136:235-242. [PMID: 33930527 DOI: 10.1016/j.jclinepi.2021.04.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 01/07/2021] [Accepted: 04/20/2021] [Indexed: 12/20/2022]
Abstract
OBJECTIVE The likelihood ratio is a method for assessing evidence regarding two simple statistical hypotheses. Its interpretation is simple - for example, a value of 10 means that the first hypothesis is 10 times as strongly supported by the data as the second. A method is shown for deriving likelihood ratios from published trial reports. STUDY DESIGN The likelihood ratio compares two hypotheses in light of data: that a new treatment is effective, at a specified level (alternate hypothesis: for instance, the hazard ratio equals 0.7), and that it is not (null hypothesis: the hazard ratio equals 1). The result of the trial is summarised by the test statistic z (ie, the estimated treatment effect divided by its standard error). The expected value of z is 0 under the null hypothesis, and A under the alternate hypothesis. The logarithm of the likelihood ratio is given by z·A - A2/2. The values of A and z can be derived from the alternate hypothesis used for sample size computation, and from the observed treatment effect and its standard error or confidence interval. RESULTS Examples are given of trials that yielded strong or moderate evidence in favor of the alternate hypothesis, and of a trial that favored the null hypothesis. The resulting likelihood ratios are applied to initial beliefs about the hypotheses to obtain posterior beliefs. CONCLUSIONS The likelihood ratio is a simple and easily understandable method for assessing evidence in data about two competing a priori hypotheses.
Collapse
Affiliation(s)
- Thomas V Perneger
- Division of Clinical Epidemiology, Geneva University Hospitals, and Faculty of Medicine, University of Geneva, Geneva 1211, Switzerland.
| |
Collapse
|
26
|
Noel JP, Paredes R, Terrebonne E, Feldman JI, Woynaroski T, Cascio CJ, Seriès P, Wallace MT. Inflexible Updating of the Self-Other Divide During a Social Context in Autism: Psychophysical, Electrophysiological, and Neural Network Modeling Evidence. Biol Psychiatry Cogn Neurosci Neuroimaging 2021; 7:756-764. [PMID: 33845169 PMCID: PMC8521572 DOI: 10.1016/j.bpsc.2021.03.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 03/08/2021] [Accepted: 03/29/2021] [Indexed: 01/21/2023]
Abstract
BACKGROUND Autism spectrum disorder (ASD) affects many aspects of life, from social interactions to (multi)sensory processing. Similarly, the condition expresses at a variety of levels of description, from genetics to neural circuits and interpersonal behavior. We attempt to bridge between domains and levels of description by detailing the behavioral, electrophysiological, and putative neural network basis of peripersonal space (PPS) updating in ASD during a social context, given that the encoding of this space relies on appropriate multisensory integration, is malleable by social context, and is thought to delineate the boundary between the self and others. METHODS Fifty (20 male/30 female) young adults, either diagnosed with ASD or age- and sex-matched individuals, took part in a visuotactile reaction time task indexing PPS, while high-density electroencephalography was continuously recorded. Neural network modeling was performed in silico. RESULTS Multisensory psychophysics demonstrates that while PPS in neurotypical individuals shrinks in the presence of others-as to "give space"-this does not occur in ASD. Likewise, electroencephalography recordings suggest that multisensory integration is altered by social context in neurotypical individuals but not in individuals with ASD. Finally, a biologically plausible neural network model shows, as a proof of principle, that PPS updating may be inflexible in ASD owing to the altered excitatory/inhibitory balance that characterizes neural circuits in animal models of ASD. CONCLUSIONS Findings are conceptually in line with recent statistical inference accounts, suggesting diminished flexibility in ASD, and further these observations by suggesting within an example relevant for social cognition that such inflexibility may be due to excitatory/inhibitory imbalances.
Collapse
Affiliation(s)
- Jean-Paul Noel
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee; Center for Neural Science, New York University, New York, New York.
| | - Renato Paredes
- Institute for Adaptive and Neural Computation, University of Edinburgh, Edinburgh, United Kingdom
| | - Emily Terrebonne
- Undergraduate Neuroscience Program, Vanderbilt University, Nashville, Tennessee; School of Medicine and Health Sciences, George Washington University, Washington, District of Columbia
| | - Jacob I Feldman
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee; Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Tiffany Woynaroski
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee; Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Carissa J Cascio
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee; Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Peggy Seriès
- Institute for Adaptive and Neural Computation, University of Edinburgh, Edinburgh, United Kingdom
| | - Mark T Wallace
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee; Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee; Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
27
|
Skerritt-Davis B, Elhilali M. Computational framework for investigating predictive processing in auditory perception. J Neurosci Methods 2021; 360:109177. [PMID: 33839191 DOI: 10.1016/j.jneumeth.2021.109177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 03/07/2021] [Accepted: 03/25/2021] [Indexed: 11/24/2022]
Abstract
BACKGROUND The brain tracks sound sources as they evolve in time, collecting contextual information to predict future sensory inputs. Previous work in predictive coding typically focuses on the perception of predictable stimuli, leaving the implementation of these same neural processes in more complex, real-world environments containing randomness and uncertainty up for debate. NEW METHOD To facilitate investigation into the perception of less tightly-controlled listening scenarios, we present a computational model as a tool to ask targeted questions about the underlying predictive processes that connect complex sensory inputs to listener behavior and neural responses. In the modeling framework, observed sound features (e.g. pitch) are tracked sequentially using Bayesian inference. Sufficient statistics are inferred from past observations at multiple time scales and used to make predictions about future observation while tracking the statistical structure of the sensory input. RESULTS Facets of the model are discussed in terms of their application to perceptual research, and examples taken from real-world audio demonstrate the model's flexibility to capture a variety of statistical structures along various perceptual dimensions. COMPARISON WITH EXISTING METHODS Previous models are often targeted toward interpreting a particular experimental paradigm (e.g., oddball paradigm), perceptual dimension (e.g., pitch processing), or task (e.g., speech segregation), thus limiting their ability to generalize to other domains. The presented model is designed as a flexible and practical tool for broad application. CONCLUSION The model is presented as a general framework for generating new hypotheses and guiding investigation into the neural processes underlying predictive coding of complex scenes.
Collapse
Affiliation(s)
| | - Mounya Elhilali
- Johns Hopkins University, 3400 N Charles St, Baltimore, MD, USA.
| |
Collapse
|
28
|
Ranjeva S, Pinciroli R, Hodell E, Mueller A, Hardin CC, Thompson BT, Berra L. Identifying clinical and biochemical phenotypes in acute respiratory distress syndrome secondary to coronavirus disease-2019. EClinicalMedicine 2021; 34:100829. [PMID: 33875978 PMCID: PMC8047387 DOI: 10.1016/j.eclinm.2021.100829] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/12/2021] [Accepted: 03/23/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Acute respiratory distress syndrome (ARDS) secondary to coronavirus disease-2019 (COVID-19) is characterized by substantial heterogeneity in clinical, biochemical, and physiological characteristics. However, the pathophysiology of severe COVID-19 infection is poorly understood. Previous studies established clinical and biological phenotypes among classical ARDS cohorts, with important therapeutic implications. The phenotypic profile of COVID-19 associated ARDS remains unknown. METHODS We used latent class modeling via a multivariate mixture model to identify phenotypes from clinical and biochemical data collected from 263 patients admitted to Massachusetts General Hospital intensive care unit with COVID-19-associated ARDS between March 13 and August 2, 2020. FINDINGS We identified two distinct phenotypes of COVID-19-associated ARDS, with substantial differences in biochemical profiles despite minimal differences in respiratory dynamics. The minority phenotype (class 2, n = 70, 26·6%) demonstrated increased markers of coagulopathy, with mild relative hyper-inflammation and dramatically increased markers of end-organ dysfunction (e.g., creatinine, troponin). The odds of 28-day mortality among the class 2 phenotype was more than double that of the class 1 phenotype (40·0% vs.· 23·3%, OR = 2·2, 95% CI [1·2, 3·9]). INTERPRETATION We identified distinct phenotypic profiles in COVID-19 associated ARDS, with little variation according to respiratory physiology but with important variation according to systemic and extra-pulmonary markers. Phenotypic identity was highly associated with short-term mortality. The class 2 phenotype exhibited prominent signatures of coagulopathy, suggesting that vascular dysfunction may play an important role in the clinical progression of severe COVID-19-related disease.
Collapse
Affiliation(s)
- Sylvia Ranjeva
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston MA 02114, USA
| | - Riccardo Pinciroli
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston MA 02114, USA
| | - Evan Hodell
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston MA 02114, USA
| | - Ariel Mueller
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston MA 02114, USA
| | - C. Corey Hardin
- Pulmonary Critical Care Division, Department of Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston MA 02114, USA
| | - B. Taylor Thompson
- Pulmonary Critical Care Division, Department of Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston MA 02114, USA
| | - Lorenzo Berra
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, 55 Fruit Street, Boston MA 02114, USA
- Correspondence author.
| |
Collapse
|
29
|
Raittio L, Launonen A, Mattila VM, Reito A. Estimates of the mean difference in orthopaedic randomized trials: obligatory yet obscure. BMC Med Res Methodol 2021; 21:59. [PMID: 33761900 PMCID: PMC7992936 DOI: 10.1186/s12874-021-01249-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 03/08/2021] [Indexed: 11/10/2022] Open
Abstract
Background Randomized controlled trials in orthopaedics are powered to mainly find large effect sizes. A possible discrepancy between the estimated and the real mean difference is a challenge for statistical inference based on p-values. We explored the justifications of the mean difference estimates used in power calculations. The assessment of distribution of observations in the primary outcome and the possibility of ceiling effects were also assessed. Methods Systematic review of the randomized controlled trials with power calculations in eight clinical orthopaedic journals published between 2016 and 2019. Trials with one continuous primary outcome and 1:1 allocation were eligible. Rationales and references for the mean difference estimate were recorded from the Methods sections. The possibility of ceiling effect was addressed by the assessment of the weighted mean and standard deviation of the primary outcome and its elaboration in the Discussion section of each RCT where available. Results 264 trials were included in this study. Of these, 108 (41 %) trials provided some rationale or reference for the mean difference estimate. The most common rationales or references for the estimate of mean difference were minimal clinical important difference (16 %), observational studies on the same subject (8 %) and the ‘clinical relevance’ of the authors (6 %). In a third of the trials, the weighted mean plus 1 standard deviation of the primary outcome reached over the best value in the patient-reported outcome measure scale, indicating the possibility of ceiling effect in the outcome. Conclusions The chosen mean difference estimates in power calculations are rarely properly justified in orthopaedic trials. In general, trials with a patient-reported outcome measure as the primary outcome do not assess or report the possibility of the ceiling effect in the primary outcome or elaborate further in the Discussion section.
Collapse
Affiliation(s)
- Lauri Raittio
- The Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, 33520, Tampere, Finland.
| | - Antti Launonen
- Department of Orthopaedics and Traumatology, Tampere University Hospital, Teiskontie 35, 33520, Tampere, Finland
| | - Ville M Mattila
- The Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, 33520, Tampere, Finland.,Department of Orthopaedics and Traumatology, Tampere University Hospital, Teiskontie 35, 33520, Tampere, Finland
| | - Aleksi Reito
- Department of Orthopaedics and Traumatology, Tampere University Hospital, Teiskontie 35, 33520, Tampere, Finland
| |
Collapse
|
30
|
Bishara AJ, Li J, Conley C. Informal versus formal judgment of statistical models: The case of normality assumptions. Psychon Bull Rev 2021; 28:1164-82. [PMID: 33660213 DOI: 10.3758/s13423-021-01879-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/04/2021] [Indexed: 11/08/2022]
Abstract
Researchers sometimes use informal judgment for statistical model diagnostics and assumption checking. Informal judgment might seem more desirable than formal judgment because of a paradox: Formal hypothesis tests of assumptions appear to become less useful as sample size increases. We suggest that this paradox can be resolved by evaluating both formal and informal statistical judgment via a simplified signal detection framework. In 4 studies, we used this approach to compare informal judgments of normality diagnostic graphs (histograms, Q-Q plots, and P-P plots) to the performance of several formal tests (Shapiro-Wilk test, Kolmogorov-Smirnov test, etc.). Participants judged whether or not graphs of sample data came from a normal population (Experiments 1-2) or whether or not from a population close enough to normal for a parametric test to be more powerful than a nonparametric one (Experiments 3-4). Across all experiments, participants' informal judgments showed lower discriminability than did formal hypothesis tests. This pattern occurred even after participants were given 400 training trials with feedback, a financial incentive, and ecologically valid distribution shapes. The discriminability advantage of formal normality tests led to slightly more powerful follow-up tests (parametric vs. nonparametric). Overall, the framework used here suggests that formal model diagnostics may be more desirable than informal ones.
Collapse
|
31
|
Jager T. Robust Likelihood-Based Approach for Automated Optimization and Uncertainty Analysis of Toxicokinetic-Toxicodynamic Models. Integr Environ Assess Manag 2021; 17:388-397. [PMID: 32860485 DOI: 10.1002/ieam.4333] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 08/18/2020] [Accepted: 08/28/2020] [Indexed: 05/14/2023]
Abstract
Toxicokinetic-toxicodynamic (TKTD) models offer a mechanistic understanding of individual-level toxicity over time and allow for meaningful extrapolations from laboratory tests to exposure conditions in the field. Thereby, they hold great potential for ecotoxicological studies, both in a regulatory context as well as for basic research. In contrast to mechanistic effect models at higher levels of biological organization, TKTD models can be, and generally are, parameterized by fitting them to data (results from toxicity tests). Fitting models comes with a range of statistical and numerical challenges, which may hamper the application of TKTD models in a practical setting. Especially in the context of environmental risk assessment, there is a need for robust and user-friendly software tools to automatically extract the best-fitting model parameters and quantify their uncertainty from any data set. The study presents a general outline for TKTD model analysis, rooted in likelihood-based ("frequentist") inference. The general outline is followed by a presentation of the specific algorithm that has been implemented into software for the robust and automated analysis of toxicity data for survival. However, the presented approach is more broadly applicable to low-dimensional problems. Integr Environ Assess Manag 2021;17:388-397. © 2020 SETAC.
Collapse
|
32
|
Zhao S, Shen M, Musa SS, Guo Z, Ran J, Peng Z, Zhao Y, Chong MKC, He D, Wang MH. Inferencing superspreading potential using zero-truncated negative binomial model: exemplification with COVID-19. BMC Med Res Methodol 2021; 21:30. [PMID: 33568100 PMCID: PMC7874987 DOI: 10.1186/s12874-021-01225-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 01/27/2021] [Indexed: 12/13/2022] Open
Abstract
Background In infectious disease transmission dynamics, the high heterogeneity in individual infectiousness indicates that few index cases generate large numbers of secondary cases, which is commonly known as superspreading events. The heterogeneity in transmission can be measured by describing the distribution of the number of secondary cases as a negative binomial (NB) distribution with dispersion parameter, k. However, such inference framework usually neglects the under-ascertainment of sporadic cases, which are those without known epidemiological link and considered as independent clusters of size one, and this may potentially bias the estimates. Methods In this study, we adopt a zero-truncated likelihood-based framework to estimate k. We evaluate the estimation performance by using stochastic simulations, and compare it with the baseline non-truncated version. We exemplify the analytical framework with three contact tracing datasets of COVID-19. Results We demonstrate that the estimation bias exists when the under-ascertainment of index cases with 0 secondary case occurs, and the zero-truncated inference overcomes this problem and yields a less biased estimator of k. We find that the k of COVID-19 is inferred at 0.32 (95%CI: 0.15, 0.64), which appears slightly smaller than many previous estimates. We provide the simulation codes applying the inference framework in this study. Conclusions The zero-truncated framework is recommended for less biased transmission heterogeneity estimates. These findings highlight the importance of individual-specific case management strategies to mitigate COVID-19 pandemic by lowering the transmission risks of potential super-spreaders with priority.
Collapse
Affiliation(s)
- Shi Zhao
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China. .,CUHK Shenzhen Research Institute, Shenzhen, China.
| | - Mingwang Shen
- School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, 710061, Shaanxi, China
| | - Salihu S Musa
- Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong, China.,Department of Mathematics, Kano University of Science and Technology, Wudil, Nigeria
| | - Zihao Guo
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China
| | - Jinjun Ran
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Zhihang Peng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yu Zhao
- School of Public Health and Management, Ningxia Medical University, Yinchuan, China
| | - Marc K C Chong
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Daihai He
- Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong, China.
| | - Maggie H Wang
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China.,CUHK Shenzhen Research Institute, Shenzhen, China
| |
Collapse
|
33
|
Fox DR, van Dam RA, Fisher R, Batley GE, Tillmanns AR, Thorley J, Schwarz CJ, Spry DJ, McTavish K. Recent Developments in Species Sensitivity Distribution Modeling. Environ Toxicol Chem 2021; 40:293-308. [PMID: 33170526 DOI: 10.1002/etc.4925] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 07/13/2020] [Accepted: 10/30/2020] [Indexed: 06/11/2023]
Abstract
The species sensitivity distribution (SSD) is a statistical approach that is used to estimate either the concentration of a chemical that is hazardous to no more than x% of all species (the HCx) or the proportion of species potentially affected by a given concentration of a chemical. Despite a significant body of published research and critical reviews over the past 20 yr aimed at improving the methodology, the fundamentals remain unchanged. Although there have been some recent suggestions for improvements to SSD methods in the literature, in general, few of these suggestions have been formally adopted. Furthermore, critics of the approach can rightly point to the fact that differences in technical implementation can lead to marked differences in results, thereby undermining confidence in SSD approaches. Despite the limitations, SSDs remain a practical tool and, until a demonstrably better inferential framework is available, developments and enhancements to conventional SSD practice will and should continue. We therefore believe the time has come for the scientific community to decide how it wants SSD methods to evolve. The present study summarizes the current status of, and elaborates on several recent developments for, SSD methods, specifically, model averaging, multimodality, and software development. We also consider future directions with respect to the use of SSDs, with the ultimate aim of helping to facilitate greater international collaboration and, potentially, greater harmonization of SSD methods. Environ Toxicol Chem 2021;40:293-308. © 2020 SETAC.
Collapse
Affiliation(s)
- D R Fox
- Environmetrics Australia, Beaumaris, Victoria, Australia
- University of Melbourne, Parkville, Victoria, Australia
| | - R A van Dam
- WQadvice, Adelaide, South Australia, Australia
| | - R Fisher
- Australian Institute of Marine Science and the University of Western Australia Oceans Institute and School of Plant Biology, Crawley, Western Australia, Australia
| | - G E Batley
- CSIRO Land and Water, Lucas Heights, New South Wales, Australia
| | - A R Tillmanns
- British Columbia Ministry of Environment and Climate Change Strategy, Victoria, British Columbia, Canada
| | - J Thorley
- Poisson Consulting, Nelson, British Columbia, Canada
| | - C J Schwarz
- StatMathComp Consulting, Vancouver, British Columbia, Canada
| | - D J Spry
- Environment and Climate Change Canada, Gatineau, Quebec, Canada
| | - K McTavish
- Environment and Climate Change Canada, Gatineau, Quebec, Canada
| |
Collapse
|
34
|
Abstract
Research in radiology and visual cognition suggest that finding one target during visual search may result in increased misses for a second target, an effect known as subsequent search misses (SSM). Here, we demonstrate that the common method of calculating second-target detection performance is biased and could produce spurious SSM effects. We describe the source of that bias and document factors that influence its magnitude. We use a modification of signal-detection theory to develop a novel, unbiased method of calculating the expected value for dual-target performance under the null hypothesis. We then apply our novel method to two of our data sets that showed modest SSM effects when calculated in the traditional manner. Our correction reduced the effect size to the point that there was no longer a significant SSM effect. We then applied our method to a published data set that had a larger effect size when calculated using the traditional calculation as well as when using an alternative calculation that was recently proposed to account for biases in the traditional method. We find that both the traditional method and the recently proposed alternative substantially overestimate the magnitude of the SSM effect in these data, but a significant SSM effect persisted even with our calculation. We recommend that future SSM studies use our method to ensure accurate effect-size estimates, and suggest that the method be applied to reanalyze published results, particularly those with small effect sizes, to rule out the possibility that they were spurious.
Collapse
|
35
|
Lin X. Learning Lessons on Reproducibility and Replicability in Large Scale Genome-Wide Association Studies. Harv Data Sci Rev 2020; 2:10.1162/99608f92.33703976. [PMID: 38362534 PMCID: PMC10869125 DOI: 10.1162/99608f92.33703976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024] Open
Abstract
Reproducibility and replicability play a pivotal role in science. The article reflects on reproducibility and replicability as they figure in large scale genome-wide association studies. Overall, we emphasize the importance of enhancing data reproducibility, analysis reproducibility, and result replicability. We make recommendations pertaining to the development of study designs that address 1) batch effects and selection bias, 2) the incorporation of discrete discovery and replication phases, and 3) the procurement of a large sample size. We emphasize the importance of systematic and transparent data generation, processing, and quality control pipelines, as well as a rigorous field-specific standardized analysis protocol, We offer guidance with respect to collaborative frameworks, open access analysis tools, and software, and the use of supporting mandates, infrastructure, and repositories for data and resource sharing. Finally, we identify the role of incentives and culture in fueling the production of reproducible and replicable research through partnerships of researchers, funding agencies, and journals.
Collapse
Affiliation(s)
- Xihong Lin
- Department of Biostatistics and Department of Statistics, Harvard University
| |
Collapse
|
36
|
Zhao S. To avoid the noncausal association between environmental factor and COVID-19 when using aggregated data: Simulation-based counterexamples for demonstration. Sci Total Environ 2020; 748:141590. [PMID: 32798858 PMCID: PMC7415212 DOI: 10.1016/j.scitotenv.2020.141590] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/07/2020] [Accepted: 08/07/2020] [Indexed: 05/08/2023]
Abstract
In the infectious disease epidemiology, the association between an independent factor and disease incidence (or death) counts may fail to infer the association with disease transmission (or mortality risk). To explore the underlying role of environmental factors in the course of COVID-19 epidemic, the importance of following the epidemiological metric's definition and systematic analytical procedures are highlighted. Cautiousness needs to be taken when understanding the outcome association based on the aggregated data, and overinterpretation should be avoided. The existing analytical approaches to address the inferential failure mentioned in this study are also discussed.
Collapse
Affiliation(s)
- Shi Zhao
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, China; CUHK Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
37
|
Diaz-Quijano FA. Estimating and testing an index of bias attributable to composite outcomes in comparative studies. J Clin Epidemiol 2020; 132:1-9. [PMID: 33309888 DOI: 10.1016/j.jclinepi.2020.12.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 09/28/2020] [Accepted: 12/01/2020] [Indexed: 01/07/2023]
Abstract
OBJECTIVES This study aimed to develop an index to evaluate the bias attributable to composite outcomes (BACOs) in comparative clinical studies. STUDY DESIGN AND SETTING The author defined the BACO index as the ratio of the logarithm of the association measure (e.g., relative risk) of the composite outcome to that of its most relevant component endpoint (e.g., mortality). Methods to calculate the confidence intervals and test the null hypotheses (BACO index = 1) were described and applied in systematically selected clinical trials. Two other preselected trials were included as "positive controls" for being examples of primary composite outcomes disregarded because of inconsistency with the treatment effect on mortality. RESULTS The BACO index values different from 1 were classified according to whether the use of composite outcomes overestimated (BACO index >1), underestimated (BACO index between 0 and <1), or inverted (BACO index <0) the association between exposure and prognosis. In 3 of 23 systematically selected trials and the two positive controls, the BACO indices were significantly lower than 1 (P < 0.005). CONCLUSION BACO index can warn that the composite outcome association is stronger, weaker, or even opposite than that of its most critical component.
Collapse
Affiliation(s)
- Fredi Alexander Diaz-Quijano
- Department of Epidemiology, Laboratório de Inferência Causal em Epidemiologia (LINCE-USP), School of Public Health, University of São Paulo, São Paulo, São Paulo, Brazil.
| |
Collapse
|
38
|
Jia P, Lin L, Kwong JSW, Xu C. Many meta-analyses of rare events in the Cochrane Database of Systematic Reviews were underpowered. J Clin Epidemiol 2021; 131:113-22. [PMID: 33271288 DOI: 10.1016/j.jclinepi.2020.11.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 10/29/2020] [Accepted: 11/23/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND AND OBJECTIVE Meta-analysis is a statistical method with the ability to increase the power for statistical inference, while it may still face the problem of being underpowered. In this study, we investigated the power to detect certain true effects for published meta-analyses of rare events. METHODS We extracted data from the Cochrane Database of Systematic Reviews for meta-analyses of rare events from January 2003 to May 2018. We retrospectively estimated the power to detect a 10-50% relative risk reduction (RRR) of eligible meta-analyses. The proportion of meta-analyses achieved a sufficient power (≥0.8) were estimated. RESULTS We identified 4,177 meta-analyses. The median power to detect 10%, 30%, and 50% RRR were 0.06 (interquartile range [IQR]: 0.05 to 0.06), 0.08 (IQR: 0.06 to 0.15), and 0.17 (IQR: 0.10 to 0.42), respectively); the corresponding proportion of meta-analyses that reached sufficient power were 0.32%, 3.68%, and 11.81%. Meta-analyses incorporating data from more studies had higher probability to achieve a sufficient power (rate ratio = 2.49, 95% CI: 1.76, 3.52, P < 0.001). CONCLUSION Most of the meta-analyses of rare events in Cochrane systematic reviews were underpowered. Future meta-analysis of rare events should report the power of the results to support informative conclusions.
Collapse
|
39
|
Lee JJ, Yin G. Principles and Reporting of Bayesian Trials. J Thorac Oncol 2020; 16:30-36. [PMID: 33229069 PMCID: PMC10127518 DOI: 10.1016/j.jtho.2020.10.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 10/04/2020] [Accepted: 10/05/2020] [Indexed: 10/23/2022]
Abstract
Bayesian clinical trials are becoming popular owing to their adaptive, flexible, and versatile nature. Such trials typically require specification of the prior distribution and construction of the likelihood function; subsequently, inference is made on the basis of the posterior distribution. In comparison with frequentist trial designs, there are less established guidelines on how to report Bayesian trials. We provide a general overview on key components of the design, conduct, and analysis of Bayesian trials and elaborate on the reporting guidelines dos and don'ts.
Collapse
Affiliation(s)
- J Jack Lee
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas.
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong
| |
Collapse
|
40
|
Scherb H, Hayashi K. Response to the "Letter to the Editor" by Alfred Körblein, "Short term increase in low birthweight babies after Fukushima". Environ Health 2020; 19:125. [PMID: 33239024 PMCID: PMC7687820 DOI: 10.1186/s12940-020-00675-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 11/02/2020] [Indexed: 06/11/2023]
Affiliation(s)
- Hagen Scherb
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Keiji Hayashi
- Hayashi Children’s Clinic, 4-6-11-1F Nagata, Joto-ku Osaka-Shi, Osaka, 536-0022 Japan
| |
Collapse
|
41
|
Ibanez-Berganza M, Amico A, Lancia GL, Maggiore F, Monechi B, Loreto V. Unsupervised inference approach to facial attractiveness. PeerJ 2020; 8:e10210. [PMID: 33194411 PMCID: PMC7602690 DOI: 10.7717/peerj.10210] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 09/28/2020] [Indexed: 11/20/2022] Open
Abstract
The perception of facial attractiveness is a complex phenomenon which depends on how the observer perceives not only individual facial features, but also their mutual influence and interplay. In the machine learning community, this problem is typically tackled as a problem of regression of the subject-averaged rating assigned to natural faces. However, it has been conjectured that this approach does not capture the complexity of the phenomenon. It has recently been shown that different human subjects can navigate the face-space and "sculpt" their preferred modification of a reference facial portrait. Here we present an unsupervised inference study of the set of sculpted facial vectors in such experiments. We first infer minimal, interpretable and accurate probabilistic models (through Maximum Entropy and artificial neural networks) of the preferred facial variations, that encode the inter-subject variance. The application of such generative models to the supervised classification of the gender of the subject that sculpted the face reveals that it may be predicted with astonishingly high accuracy. We observe that the classification accuracy improves by increasing the order of the non-linear effective interaction. This suggests that the cognitive mechanisms related to facial discrimination in the brain do not involve the positions of single facial landmarks only, but mainly the mutual influence of couples, and even triplets and quadruplets of landmarks. Furthermore, the high prediction accuracy of the subjects' gender suggests that much relevant information regarding the subjects may influence (and be elicited from) their facial preference criteria, in agreement with the multiple motive theory of attractiveness proposed in previous works.
Collapse
Affiliation(s)
| | - Ambra Amico
- Chair of Systems Design, Swiss Federal Institute of Technology, Zurich, Switzerland
| | - Gian Luca Lancia
- Department of Physics, University of Roma “La Sapienza”, Rome, Italy
| | - Federico Maggiore
- Department of Physics, University of Roma “La Sapienza”, Rome, Italy
| | | | - Vittorio Loreto
- Department of Physics, University of Roma “La Sapienza”, Rome, Italy
- SONY Computer Science Laboratories, Paris, France
- Complexity Science Hub, Vienna, Austria
| |
Collapse
|
42
|
Farrar BG, Altschul DM, Fischer J, van der Mescht J, Placì S, Troisi CA, Vernouillet A, Clayton NS, Ostojić L. Trialling Meta-Research in Comparative Cognition: Claims and Statistical Inference in Animal Physical Cognition. Anim Behav Cogn 2020; 7:419-444. [PMID: 32851123 DOI: 10.26451/abc.07.03.09.2020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Scientific disciplines face concerns about replicability and statistical inference, and these concerns are also relevant in animal cognition research. This paper presents a first attempt to assess how researchers make and publish claims about animal physical cognition, and the statistical inferences they use to support them. We surveyed 116 published experiments from 63 papers on physical cognition, covering 43 different species. The most common tasks in our sample were trap-tube tasks (14 papers), other tool use tasks (13 papers), means-end understanding and string-pulling tasks (11 papers), object choice and object permanence tasks (9 papers) and access tasks (5 papers). This sample is not representative of the full scope of physical cognition research; however, it does provide data on the types of statistical design and publication decisions researchers have adopted. Across the 116 experiments, the median sample size was 7. Depending on the definitions we used, we estimated that between 44% and 59% of our sample of papers made positive claims about animals' physical cognitive abilities, between 24% and 46% made inconclusive claims, and between 10% and 17% made negative claims. Several failures of animals to pass physical cognition tasks were reported. Although our measures had low inter-observer reliability, these findings show that negative results can and have been published in the field. However, publication bias is still present, and consistent with this, we observed a drop in the frequency of p-values above .05. This suggests that some non-significant results have not been published. More promisingly, we found that researchers are likely making many correct statistical inferences at the individual-level. The strength of evidence of statistical effects at the group-level was weaker, and its p-value distribution was consistent with some effect sizes being overestimated. Studies such as ours can form part of a wider investigation into statistical reliability in comparative cognition. However, future work should focus on developing the validity and reliability of the measurements they use, and we offer some starting points.
Collapse
Affiliation(s)
- Benjamin G Farrar
- Department of Psychology, University of Cambridge, Cambridge, UK.,Institute for Globally Distributed Open Research and Education (IGDORE)
| | - Drew M Altschul
- Department of Psychology, The University of Edinburgh, Edinburgh, UK.,Scottish Primate Research Group, UK
| | - Julia Fischer
- Cognitive Ethology Laboratory, German Primate Center, Göttingen, Germany
| | - Jolene van der Mescht
- Department of Psychology, The University of Edinburgh, Edinburgh, UK.,Scottish Primate Research Group, UK
| | - Sarah Placì
- Cognitive Ethology Laboratory, German Primate Center, Göttingen, Germany
| | - Camille A Troisi
- School of Biological, Earth and Environmental Sciences, University College Cork, Cork, Ireland
| | | | - Nicola S Clayton
- Department of Psychology, University of Cambridge, Cambridge, UK
| | - Ljerka Ostojić
- Institute for Globally Distributed Open Research and Education (IGDORE)
| |
Collapse
|
43
|
Diaz-Quijano FA, Calixto FM, da Silva JMN. How feasible is it to abandon statistical significance? A reflection based on a short survey. BMC Med Res Methodol 2020; 20:140. [PMID: 32493293 PMCID: PMC7271502 DOI: 10.1186/s12874-020-01030-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 05/24/2020] [Indexed: 11/30/2022] Open
Abstract
Background There is a growing trend in using the “statistically significant” term in the scientific literature. However, harsh criticism of this concept motivated the recommendation to withdraw its use of scientific publications. We aimed to validate the support and the feasibility of adherence to this recommendation, among researchers having declared in favor of removing the statistical significance. Methods We surveyed signatories of an article published that defended this recommendation, to validate their opinion and ask them about how likely they will retire the concept of statistical significance. Results We obtained 151 responses which confirmed the support for the mentioned publication in aspects such as the adequate interpretation of the p-value, the degree of agreement, and the motivations to sign it. However, there was a wide distribution of answers about how likely are they to use the concept of “statistical significance” in future publications. About 42% declared being neutral, or that would likely use it again. We described arguments referred by several signatories and discussed aspects to be considered in the interpretation of research results. Conclusions The responses obtained from a proportion of signatories validated their declared position against the use of statistical significance. However, even in this group, the full application of this recommendation does not seem feasible. The arguments related to the inappropriate use of statistical tests should promote more education among researchers and users of scientific evidence.
Collapse
Affiliation(s)
- Fredi Alexander Diaz-Quijano
- Department of Epidemiology, School of Public Health, University of São Paulo, Av. Dr. Arnaldo, 715, Cerqueira César, CEP 01246-904, São Paulo, SP, 01246-904, Brazil. .,Laboratório de Inferência Causal em Epidemiologia da Universidade de São Paulo (LINCE-USP), São Paulo, Brazil.
| | - Fernando Morelli Calixto
- Laboratório de Inferência Causal em Epidemiologia da Universidade de São Paulo (LINCE-USP), São Paulo, Brazil.,Public Health, School of Public Health, University of São Paulo, São Paulo, Brazil
| | - José Mário Nunes da Silva
- Laboratório de Inferência Causal em Epidemiologia da Universidade de São Paulo (LINCE-USP), São Paulo, Brazil.,Epidemiology, School of Public Health, University of São Paulo, São Paulo, Brazil
| |
Collapse
|
44
|
Abstract
Ecological data often violate common assumptions of traditional parametric statistics (e.g., that residuals are Normally distributed, have constant variance, and cases are independent). Modern statistical methods are well equipped to handle these complications, but they can be challenging for non-statisticians to understand and implement. Rather than default to increasingly complex statistical methods, resampling-based methods can sometimes provide an alternative method for performing statistical inference, while also facilitating a deeper understanding of foundational concepts in frequentist statistics (e.g., sampling distributions, confidence intervals, p-values). Using simple examples and case studies, we demonstrate how resampling-based methods can help elucidate core statistical concepts and provide alternative methods for tackling challenging problems across a broad range of ecological applications.
Collapse
Affiliation(s)
- John R Fieberg
- Department of Fisheries, Wildlife, and Conservation Biology, University of Minnesota, St. Paul, MN, USA
| | - Kelsey Vitense
- Department of Fisheries, Wildlife, and Conservation Biology, University of Minnesota, St. Paul, MN, USA
| | - Douglas H Johnson
- Department of Fisheries, Wildlife, and Conservation Biology, University of Minnesota, St. Paul, MN, USA
| |
Collapse
|
45
|
Xu C, Li L, Lin L, Chu H, Thabane L, Zou K, Sun X. Exclusion of studies with no events in both arms in meta-analysis impacted the conclusions. J Clin Epidemiol 2020; 123:91-99. [PMID: 32247025 DOI: 10.1016/j.jclinepi.2020.03.020] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 03/03/2020] [Accepted: 03/26/2020] [Indexed: 02/08/2023]
Abstract
OBJECTIVES Classical meta-analyses routinely treated studies with no events in both arms noninformative and excluded them from analyses. This study assessed whether such studies contain information and have an influence on the conclusions of meta-analyses. STUDY DESIGN AND SETTING We collected meta-analyses of binary outcomes with at least one study having no events in both arms from Cochrane systematic reviews (2003-2018). We used the generalized linear mixed model to reanalyze these meta-analyses by two approaches: one including studies with no events in both arms and one excluding such studies. The magnitude and direction of odds ratio (OR), P value, and width of 95% confidence interval (CI) were compared. A simulation study was conducted to examine the robustness of results. RESULTS We identified 442 meta-analyses. In comparing paired meta-analyses that included studies with no events in both arms vs. those not, 8 (1.80%) resulted in different directions on OR; 41 (9.28%) altered conclusions on statistical significance. Substantial changes occurred on P value (55.66% increased and 44.12% decreased) and the width of 95% CI (50.68% inflated and 49.32% declined) when excluding studies with no events. Simulation study confirmed these findings. CONCLUSION Studies with no events in both arms are not necessarily noninformative. Excluding such studies may alter conclusions.
Collapse
Affiliation(s)
- Chang Xu
- Chinese Evidence-Based Medicine Center & Cochrane China, West China Hospital, Sichuan University, Chengdu, China
| | - Ling Li
- Chinese Evidence-Based Medicine Center & Cochrane China, West China Hospital, Sichuan University, Chengdu, China
| | - Lifeng Lin
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Haitao Chu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Lehana Thabane
- Department of Health Research Methods, Evidence, and Impact, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada; Biostatistics Unit, Father Sean O'Sullivan Research Centre, St Joseph's Healthcare, 3rd Floor, Martha Wing, Room H-325, 50 Charlton Avenue East, Hamilton, Ontario L8N 4A6, Canada
| | - Kang Zou
- Chinese Evidence-Based Medicine Center & Cochrane China, West China Hospital, Sichuan University, Chengdu, China
| | - Xin Sun
- Chinese Evidence-Based Medicine Center & Cochrane China, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
46
|
Garcia EJ, Cain ME. Environmental enrichment and a selective metabotropic glutamate receptor 2/3 (mGluR 2/3) agonist suppress amphetamine self-administration: Characterizing baseline differences. Pharmacol Biochem Behav 2020; 192:172907. [PMID: 32179027 DOI: 10.1016/j.pbb.2020.172907] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/19/2020] [Accepted: 03/12/2020] [Indexed: 01/26/2023]
Abstract
A challenge for developing effective treatments for substance use disorders (SUDs) is understanding how environmental variables alter the efficacy of therapeutics. Environmental enrichment (EC) enhances brain development and protects against behaviors associated with drug abuse vulnerability when compared to rats reared in isolation (IC) or standard conditions (SC). EC rearing enhances the expression and function of metabotropic glutamate receptor2/3 (mGlurR2/3) and activating mGluR2/3 reduces psychostimulant self-administration (SA). However, the ability for mGluR2/3 activation to suppress amphetamine (AMP) SA in differentially reared rats is not determined. Therefore, we tested the hypothesis EC reduces AMP (SA) by augmenting mGluR2/3 function. At postnatal day 21, male Sprague-Dawley rats were assigned to EC, IC, or SC environments for 30 days. Then, they acquired AMP SA and were moved to a progressive ratio (PR) schedule of reinforcement. EC, IC, and SC rats were pretreated with LY379268 (vehicle, 0.3 and 1 mg/kg), a selective mGluR2/3 agonist, before PR behavioral sessions. Linear mixed effects analysis determined EC rats had reduced motivation for AMP SA when compared to IC or SC rats and that LY379268 dose-dependently suppressed AMP SA, but there was no evidence of an interaction. Cumming/Gardner-Altman estimation plots illustrate that the 0.3 mg/kg dose suppressed infusions in EC rats while the 1 mg/kg dose suppressed infusions in SC rats. LY379268 was incapable of suppressing the motivation for AMP SA in IC rats. Controlling for baseline differences in differentially reared rats remains a challenge. Normalizing to a baseline introduced error which is illustrated in the precision of the estimated effect size differences. The data indicate that environmental enrichment enhances the ability of a selective mGluR2/3 agonist to suppress AMP SA and indicates the functional status of the mGluR2/3 is formed during development. Therefore, environmental history must be considered when evaluating pharmacological therapeutics particularly those aimed at the mGluR2/3.
Collapse
Affiliation(s)
- Erik J Garcia
- Department of Psychological Sciences, Kansas State University, United States of America
| | - Mary E Cain
- Department of Psychological Sciences, Kansas State University, United States of America.
| |
Collapse
|
47
|
Abstract
This chapter demystifies P-values, hypothesis tests and significance tests and introduces the concepts of local evidence and global error rates. The local evidence is embodied in this data and concerns the hypotheses of interest for this experiment, whereas the global error rate is a property of the statistical analysis and sampling procedure. It is shown using simple examples that local evidence and global error rates can be, and should be, considered together when making inferences. Power analysis for experimental design for hypothesis testing is explained, along with the more locally focussed expected P-values. Issues relating to multiple testing, HARKing and P-hacking are explained, and it is shown that, in many situations, their effects on local evidence and global error rates are in conflict, a conflict that can always be overcome by a fresh dataset from replication of key experiments. Statistics is complicated, and so is science. There is no singular right way to do either, and universally acceptable compromises may not exist. Statistics offers a wide array of tools for assisting with scientific inference by calibrating uncertainty, but statistical inference is not a substitute for scientific inference. P-values are useful indices of evidence and deserve their place in the statistical toolbox of basic pharmacologists.
Collapse
|
48
|
Lewis N, Gazula H, Plis SM, Calhoun VD. Decentralized distribution-sampled classification models with application to brain imaging. J Neurosci Methods 2019; 329:108418. [PMID: 31630085 DOI: 10.1016/j.jneumeth.2019.108418] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 08/27/2019] [Accepted: 08/27/2019] [Indexed: 11/29/2022]
Abstract
BACKGROUND In this age of big data, certain models require very large data stores in order to be informative and accurate. In many cases however, the data are stored in separate locations requiring data transfer between local sites which can cause various practical hurdles, such as privacy concerns or heavy network load. This is especially true for medical imaging data, which can be constrained due to the health insurance portability and accountability act (HIPAA) which provides security protocols for medical data. Medical imaging datasets can also contain many thousands or millions of features, requiring heavy network load. NEW METHOD Our research expands upon current decentralized classification research by implementing a new singleshot method for both neural networks and support vector machines. Our approach is to estimate the statistical distribution of the data at each local site and pass this information to the other local sites where each site resamples from the individual distributions and trains a model on both locally available data and the resampled data. The model for each local site produces its own accuracy value which are then averaged together to produce the global average accuracy. RESULTS We show applications of our approach to handwritten digit classification as well as to multi-subject classification of brain imaging data collected from patients with schizophrenia and healthy controls. Overall, the results showed comparable classification accuracy to the centralized model with lower network load than multishot methods. COMPARISON WITH EXISTING METHODS Many decentralized classifiers are multishot, requiring heavy network traffic. Our model attempts to alleviate this load while preserving prediction accuracy. CONCLUSIONS We show that our proposed approach performs comparably to a centralized approach while minimizing network traffic compared to multishot methods.
Collapse
Affiliation(s)
- Noah Lewis
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, United States; Department of Computer Science, The University of New Mexico, Albuquerque, NM, United States.
| | - Harshvardhan Gazula
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, United States
| | - Sergey M Plis
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, United States
| | - Vince D Calhoun
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, United States; Department of Computer Science, The University of New Mexico, Albuquerque, NM, United States; Department of Electrical and Computer Engineering, The University of New Mexico, Albuquerque, NM, United States
| |
Collapse
|
49
|
Novelli L, Wollstadt P, Mediano P, Wibral M, Lizier JT. Large-scale directed network inference with multivariate transfer entropy and hierarchical statistical testing. Netw Neurosci 2019; 3:827-847. [PMID: 31410382 PMCID: PMC6663300 DOI: 10.1162/netn_a_00092] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 04/24/2019] [Indexed: 12/14/2022] Open
Abstract
Network inference algorithms are valuable tools for the study of large-scale neuroimaging datasets. Multivariate transfer entropy is well suited for this task, being a model-free measure that captures nonlinear and lagged dependencies between time series to infer a minimal directed network model. Greedy algorithms have been proposed to efficiently deal with high-dimensional datasets while avoiding redundant inferences and capturing synergistic effects. However, multiple statistical comparisons may inflate the false positive rate and are computationally demanding, which limited the size of previous validation studies. The algorithm we present-as implemented in the IDTxl open-source software-addresses these challenges by employing hierarchical statistical tests to control the family-wise error rate and to allow for efficient parallelization. The method was validated on synthetic datasets involving random networks of increasing size (up to 100 nodes), for both linear and nonlinear dynamics. The performance increased with the length of the time series, reaching consistently high precision, recall, and specificity (>98% on average) for 10,000 time samples. Varying the statistical significance threshold showed a more favorable precision-recall trade-off for longer time series. Both the network size and the sample size are one order of magnitude larger than previously demonstrated, showing feasibility for typical EEG and magnetoencephalography experiments.
Collapse
Affiliation(s)
- Leonardo Novelli
- Centre for Complex Systems, Faculty of Engineering, The University of Sydney, Sydney, Australia
| | | | - Pedro Mediano
- Computational Neurodynamics Group, Department of Computing, Imperial College London, London, United Kingdom
| | - Michael Wibral
- Campus Institute for Dynamics of Biological Networks, Georg-August University, Göttingen, Germany
| | - Joseph T. Lizier
- Centre for Complex Systems, Faculty of Engineering, The University of Sydney, Sydney, Australia
| |
Collapse
|
50
|
Abstract
We review a number of issues regarding missing data treatments for intervention and prevention researchers. Many of the common missing data practices in prevention research are still, unfortunately, ill-advised (e.g., use of listwise and pairwise deletion, insufficient use of auxiliary variables). Our goal is to promote better practice in the handling of missing data. We review the current state of missing data methodology and recent missing data reporting in prevention research. We describe antiquated, ad hoc missing data treatments and discuss their limitations. We discuss two modern, principled missing data treatments: multiple imputation and full information maximum likelihood, and we offer practical tips on how to best employ these methods in prevention research. The principled missing data treatments that we discuss are couched in terms of how they improve causal and statistical inference in the prevention sciences. Our recommendations are firmly grounded in missing data theory and well-validated statistical principles for handling the missing data issues that are ubiquitous in biosocial and prevention research. We augment our broad survey of missing data analysis with references to more exhaustive resources.
Collapse
Affiliation(s)
- Kyle M Lang
- Institute for Measurement, Methodology, Analysis, and Policy, Texas Tech University, Lubbock, USA.
| | - Todd D Little
- Institute for Measurement, Methodology, Analysis, and Policy, Texas Tech University, Lubbock, USA.
| |
Collapse
|