1
|
Hafiz R, Saeed S. Hybrid whale algorithm with evolutionary strategies and filtering for high-dimensional optimization: Application to microarray cancer data. PLoS One 2024; 19:e0295643. [PMID: 38466740 PMCID: PMC10927076 DOI: 10.1371/journal.pone.0295643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 11/28/2023] [Indexed: 03/13/2024] Open
Abstract
The standard whale algorithm is prone to suboptimal results and inefficiencies in high-dimensional search spaces. Therefore, examining the whale optimization algorithm components is critical. The computer-generated initial populations often exhibit an uneven distribution in the solution space, leading to low diversity. We propose a fusion of this algorithm with a discrete recombinant evolutionary strategy to enhance initialization diversity. We conduct simulation experiments and compare the proposed algorithm with the original WOA on thirteen benchmark test functions. Simulation experiments on unimodal or multimodal benchmarks verified the better performance of the proposed RESHWOA, such as accuracy, minimum mean, and low standard deviation rate. Furthermore, we performed two data reduction techniques, Bhattacharya distance and signal-to-noise ratio. Support Vector Machine (SVM) excels in dealing with high-dimensional datasets and numerical features. When users optimize the parameters, they can significantly improve the SVM's performance, even though it already works well with its default settings. We applied RESHWOA and WOA methods on six microarray cancer datasets to optimize the SVM parameters. The exhaustive examination and detailed results demonstrate that the new structure has addressed WOA's main shortcomings. We conclude that the proposed RESHWOA performed significantly better than the WOA.
Collapse
Affiliation(s)
- Rahila Hafiz
- College of Statistical Sciences, University of the Punjab, Lahore, Pakistan
| | - Sana Saeed
- College of Statistical Sciences, University of the Punjab, Lahore, Pakistan
| |
Collapse
|
2
|
Pesce E, Rapallo F, Riccomagno E, Wynn HP. Generation of all randomizations using circuits. ANN I STAT MATH 2022; 75:683-704. [PMID: 36590375 PMCID: PMC9786527 DOI: 10.1007/s10463-022-00860-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/01/2022] [Accepted: 11/17/2022] [Indexed: 12/24/2022]
Abstract
After a rich history in medicine, randomized control trials (RCTs), both simple and complex, are in increasing use in other areas, such as web-based A/B testing and planning and design of decisions. A main objective of RCTs is to be able to measure parameters, and contrasts in particular, while guarding against biases from hidden confounders. After careful definitions of classical entities such as contrasts, an algebraic method based on circuits is introduced which gives a wide choice of randomization schemes.
Collapse
Affiliation(s)
- Elena Pesce
- Swiss Re Institute, Swiss Re Management Ltd, Mythenquai 50/60, 8022 Zurich, Switzerland
| | - Fabio Rapallo
- Department of Economics, Università di Genova, Via F. Vivaldi 5, 16126 Genoa, Italy
| | - Eva Riccomagno
- Department of Mathematics, Università di Genova, Via Dodecaneso 35, 16146 Genoa, Italy
| | | |
Collapse
|
3
|
Chang MC. Predictive Subdata Selection for Computer Models. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2097247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
4
|
Jackson CH, Baio G, Heath A, Strong M, Welton NJ, Wilson EC. Value of Information Analysis in Models to Inform Health Policy. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 9:95-118. [PMID: 35415193 PMCID: PMC7612603 DOI: 10.1146/annurev-statistics-040120-010730] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Value of information (VoI) is a decision-theoretic approach to estimating the expected benefits from collecting further information of different kinds, in scientific problems based on combining one or more sources of data. VoI methods can assess the sensitivity of models to different sources of uncertainty and help to set priorities for further data collection. They have been widely applied in healthcare policy making, but the ideas are general to a range of evidence synthesis and decision problems. This article gives a broad overview of VoI methods, explaining the principles behind them, the range of problems that can be tackled with them, and how they can be implemented, and discusses the ongoing challenges in the area.
Collapse
Affiliation(s)
| | - Gianluca Baio
- Department of Statistical Science, University College London, London WC1E 6BT, United Kingdom
| | - Anna Heath
- The Hospital for Sick Children, Toronto, Ontario M5G 1X8, Canada
| | - Mark Strong
- School of Health and Related Research, University of Sheffield, Sheffield S1 4DA, United Kingdom
| | - Nicky J. Welton
- Bristol Medical School (PHS), University of Bristol, Bristol BS8 1QU, United Kingdom
| | | |
Collapse
|
5
|
López-Fidalgo J, Wiens DP. Robust active learning with binary responses. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2022.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
Ren M, Zhao SL. Subdata selection based on orthogonal array for big data. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.2012196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Min Ren
- School of Statistics, Qufu Normal University, Qufu, China
| | - Sheng-Li Zhao
- School of Statistics, Qufu Normal University, Qufu, China
| |
Collapse
|
7
|
Paglia J, Eidsvik J, Karvanen J. Efficient spatial designs using Hausdorff distances and Bayesian optimization. Scand Stat Theory Appl 2021. [DOI: 10.1111/sjos.12554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Jacopo Paglia
- Department of Mathematical Sciences Norwegian University of Science and Technology Trondheim Norway
| | - Jo Eidsvik
- Department of Mathematical Sciences Norwegian University of Science and Technology Trondheim Norway
| | - Juha Karvanen
- Department of Mathematics and Statistics University of Jyvaskyla Jyväskylä Finland
| |
Collapse
|
8
|
Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding. PLANTS 2019; 9:plants9010034. [PMID: 31881663 PMCID: PMC7020215 DOI: 10.3390/plants9010034] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 12/17/2019] [Accepted: 12/23/2019] [Indexed: 12/27/2022]
Abstract
Crops are the major source of food supply and raw materials for the processing industry. A balance between crop production and food consumption is continually threatened by plant diseases and adverse environmental conditions. This leads to serious losses every year and results in food shortages, particularly in developing countries. Presently, cutting-edge technologies for genome sequencing and phenotyping of crops combined with progress in computational sciences are leading a revolution in plant breeding, boosting the identification of the genetic basis of traits at a precision never reached before. In this frame, machine learning (ML) plays a pivotal role in data-mining and analysis, providing relevant information for decision-making towards achieving breeding targets. To this end, we summarize the recent progress in next-generation sequencing and the role of phenotyping technologies in genomics-assisted breeding toward the exploitation of the natural variation and the identification of target genes. We also explore the application of ML in managing big data and predictive models, reporting a case study using microRNAs (miRNAs) to identify genes related to stress conditions.
Collapse
|
9
|
Affiliation(s)
- Rafael A. Moral
- Department of Mathematics and Statistics, Maynooth University, Maynooth, Ireland
| | - John Hinde
- School of Mathematics, Statistics, and Applied Mathematics, NUI Galway, Galway, Ireland
| | | |
Collapse
|
10
|
Salmaso L, Pegoraro L, Giancristofaro RA, Ceccato R, Bianchi A, Restello S, Scarabottolo D. Design of experiments and machine learning to improve robustness of predictive maintenance with application to a real case study. COMMUN STAT-SIMUL C 2019. [DOI: 10.1080/03610918.2019.1656740] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Luigi Salmaso
- Dipartimento di Tecnica e Gestione dei Sistemi Industriali, Universitá degli Studi di Padova, Padova, Italy
| | - Luca Pegoraro
- Dipartimento di Tecnica e Gestione dei Sistemi Industriali, Universitá degli Studi di Padova, Padova, Italy
| | | | - Riccardo Ceccato
- Dipartimento di Tecnica e Gestione dei Sistemi Industriali, Universitá degli Studi di Padova, Padova, Italy
| | | | | | | |
Collapse
|
11
|
Cobb JN, Juma RU, Biswas PS, Arbelaez JD, Rutkoski J, Atlin G, Hagen T, Quinn M, Ng EH. Enhancing the rate of genetic gain in public-sector plant breeding programs: lessons from the breeder's equation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:627-645. [PMID: 30824972 PMCID: PMC6439161 DOI: 10.1007/s00122-019-03317-0] [Citation(s) in RCA: 105] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Accepted: 02/21/2019] [Indexed: 05/20/2023]
Abstract
The integration of new technologies into public plant breeding programs can make a powerful step change in agricultural productivity when aligned with principles of quantitative and Mendelian genetics. The breeder's equation is the foundational application of quantitative genetics to crop improvement. Guided by the variables that describe response to selection, emerging breeding technologies can make a powerful step change in the effectiveness of public breeding programs. The most promising innovations for increasing the rate of genetic gain without greatly increasing program size appear to be related to reducing breeding cycle time, which is likely to require the implementation of parent selection on non-inbred progeny, rapid generation advance, and genomic selection. These are complex processes and will require breeding organizations to adopt a culture of continuous optimization and improvement. To enable this, research managers will need to consider and proactively manage the, accountability, strategy, and resource allocations of breeding teams. This must be combined with thoughtful management of elite genetic variation and a clear separation between the parental selection process and product development and advancement process. With an abundance of new technologies available, breeding teams need to evaluate carefully the impact of any new technology on selection intensity, selection accuracy, and breeding cycle length relative to its cost of deployment. Finally breeding data management systems need to be well designed to support selection decisions and novel approaches to accelerate breeding cycles need to be routinely evaluated and deployed.
Collapse
Affiliation(s)
- Joshua N Cobb
- International Rice Research Institute, Los Banos, Laguna, Philippines.
| | - Roselyne U Juma
- International Rice Research Institute, Los Banos, Laguna, Philippines
- Kenya Agricultural and Livestock Research Organization, Nairobi, Kenya
| | - Partha S Biswas
- International Rice Research Institute, Los Banos, Laguna, Philippines
- Bangladesh Rice Research Institute, Gazipur, Bangladesh
| | - Juan D Arbelaez
- International Rice Research Institute, Los Banos, Laguna, Philippines
| | - Jessica Rutkoski
- International Rice Research Institute, Los Banos, Laguna, Philippines
| | - Gary Atlin
- Bill and Melinda Gates Foundation, Seattle, WA, USA
| | - Tom Hagen
- CGIAR Excellence in Breeding Platform (EiB), El Batan, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), El Batan, Mexico
| | - Michael Quinn
- CGIAR Excellence in Breeding Platform (EiB), El Batan, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), El Batan, Mexico
| | - Eng Hwa Ng
- CGIAR Excellence in Breeding Platform (EiB), El Batan, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), El Batan, Mexico
| |
Collapse
|
12
|
Rhodes KM, Turner RM, Payne RA, White IR. Computationally efficient methods for fitting mixed models to electronic health records data. Stat Med 2018; 37:4557-4570. [PMID: 30155902 PMCID: PMC6240345 DOI: 10.1002/sim.7944] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 06/27/2018] [Accepted: 07/20/2018] [Indexed: 11/12/2022]
Abstract
Motivated by two case studies using primary care records from the Clinical Practice Research Datalink, we describe statistical methods that facilitate the analysis of tall data, with very large numbers of observations. Our focus is on investigating the association between patient characteristics and an outcome of interest, while allowing for variation among general practices. We explore ways to fit mixed-effects models to tall data, including predictors of interest and confounding factors as covariates, and including random intercepts to allow for heterogeneity in outcome among practices. We introduce (1) weighted regression and (2) meta-analysis of estimated regression coefficients from each practice. Both methods reduce the size of the dataset, thus decreasing the time required for statistical analysis. We compare the methods to an existing subsampling approach. All methods give similar point estimates, and weighted regression and meta-analysis give similar standard errors for point estimates to analysis of the entire dataset, but the subsampling method gives larger standard errors. Where all data are discrete, weighted regression is equivalent to fitting the mixed model to the entire dataset. In the presence of a continuous covariate, meta-analysis is useful. Both methods are easy to implement in standard statistical software.
Collapse
Affiliation(s)
- K M Rhodes
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Cambridge, UK
| | - R M Turner
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Cambridge, UK
- MRC Clinical Trials Unit at University College London, Institute of Clinical Trials and Methodology, London, UK
| | - R A Payne
- Centre for Academic Primary Care, Bristol Medical School, University of Bristol, Bristol, UK
| | - I R White
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Cambridge, UK
- MRC Clinical Trials Unit at University College London, Institute of Clinical Trials and Methodology, London, UK
| |
Collapse
|
13
|
Wright ST, Ryan LM, Pham T. A novel case-control subsampling approach for rapid model exploration of large clustered binary data. Stat Med 2018; 37:899-913. [PMID: 29230851 DOI: 10.1002/sim.7543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Revised: 05/25/2017] [Accepted: 10/01/2017] [Indexed: 11/10/2022]
Abstract
In many settings, an analysis goal is the identification of a factor, or set of factors associated with an event or outcome. Often, these associations are then used for inference and prediction. Unfortunately, in the big data era, the model building and exploration phases of analysis can be time-consuming, especially if constrained by computing power (ie, a typical corporate workstation). To speed up this model development, we propose a novel subsampling scheme to enable rapid model exploration of clustered binary data using flexible yet complex model set-ups (GLMMs with additive smoothing splines). By reframing the binary response prospective cohort study into a case-control-type design, and using our knowledge of sampling fractions, we show one can approximate the model estimates as would be calculated from a full cohort analysis. This idea is extended to derive cluster-specific sampling fractions and thereby incorporate cluster variation into an analysis. Importantly, we demonstrate that previously computationally prohibitive analyses can be conducted in a timely manner on a typical workstation. The approach is applied to analysing risk factors associated with adverse reactions relating to blood donation.
Collapse
Affiliation(s)
- Stephen T Wright
- Mathematical and Physical Sciences, University of Technology Sydney, Australia.,Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers, Australia.,Research and Development, Australian Red Cross Blood Service, Australia
| | - Louise M Ryan
- Mathematical and Physical Sciences, University of Technology Sydney, Australia.,Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers, Australia
| | - Tung Pham
- Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers, Australia.,Department of Mathematics and Statistics, University of Melbourne, Australia
| |
Collapse
|