1
|
Freda PJ, Ghosh A, Zhang E, Luo T, Chitre AS, Polesskaya O, St Pierre CL, Gao J, Martin CD, Chen H, Garcia-Martinez AG, Wang T, Han W, Ishiwari K, Meyer P, Lamparelli A, King CP, Palmer AA, Li R, Moore JH. Automated quantitative trait locus analysis (AutoQTL). BioData Min 2023; 16:14. [PMID: 37038201 PMCID: PMC10088184 DOI: 10.1186/s13040-023-00331-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 03/31/2023] [Indexed: 04/12/2023] Open
Abstract
BACKGROUND Quantitative Trait Locus (QTL) analysis and Genome-Wide Association Studies (GWAS) have the power to identify variants that capture significant levels of phenotypic variance in complex traits. However, effort and time are required to select the best methods and optimize parameters and pre-processing steps. Although machine learning approaches have been shown to greatly assist in optimization and data processing, applying them to QTL analysis and GWAS is challenging due to the complexity of large, heterogenous datasets. Here, we describe proof-of-concept for an automated machine learning approach, AutoQTL, with the ability to automate many complicated decisions related to analysis of complex traits and generate solutions to describe relationships that exist in genetic data. RESULTS Using a publicly available dataset of 18 putative QTL from a large-scale GWAS of body mass index in the laboratory rat, Rattus norvegicus, AutoQTL captures the phenotypic variance explained under a standard additive model. AutoQTL also detects evidence of non-additive effects including deviations from additivity and 2-way epistatic interactions in simulated data via multiple optimal solutions. Additionally, feature importance metrics provide different insights into the inheritance models and predictive power of multiple GWAS-derived putative QTL. CONCLUSIONS This proof-of-concept illustrates that automated machine learning techniques can complement standard approaches and have the potential to detect both additive and non-additive effects via various optimal solutions and feature importance metrics. In the future, we aim to expand AutoQTL to accommodate omics-level datasets with intelligent feature selection and feature engineering strategies.
Collapse
Affiliation(s)
- Philip J Freda
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Attri Ghosh
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Elizabeth Zhang
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Tianhao Luo
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, 9500 Gilman Dr., Mail Code: 0667, La Jolla, CA, 92093-0667, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, 9500 Gilman Dr., Mail Code: 0667, La Jolla, CA, 92093-0667, USA
| | - Celine L St Pierre
- Department of Psychiatry, University of California San Diego, 9500 Gilman Dr., Mail Code: 0667, La Jolla, CA, 92093-0667, USA
| | - Jianjun Gao
- Department of Psychiatry, University of California San Diego, 9500 Gilman Dr., Mail Code: 0667, La Jolla, CA, 92093-0667, USA
| | - Connor D Martin
- Department of Pharmacology & Toxicology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, 955 Main Street, Suite 3102, Buffalo, NY, 14203, USA
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Translational Research Building, 71 South Manassas, Memphis, TN, 38163, USA
| | - Angel G Garcia-Martinez
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Translational Research Building, 71 South Manassas, Memphis, TN, 38163, USA
| | - Tengfei Wang
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Translational Research Building, 71 South Manassas, Memphis, TN, 38163, USA
| | - Wenyan Han
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Translational Research Building, 71 South Manassas, Memphis, TN, 38163, USA
| | - Keita Ishiwari
- Department of Pharmacology & Toxicology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, 955 Main Street, Suite 3102, Buffalo, NY, 14203, USA
- Clinical and Research Institute on Addictions, University at Buffalo, 1021 Main Street, Buffalo, NY, 14203-1016, USA
| | - Paul Meyer
- Department of Psychology, University at Buffalo, 204 Park Hall, North Campus, Buffalo, NY, 14260-4110, USA
| | - Alexander Lamparelli
- Department of Psychology, University at Buffalo, 204 Park Hall, North Campus, Buffalo, NY, 14260-4110, USA
| | - Christopher P King
- Department of Psychology, University at Buffalo, 204 Park Hall, North Campus, Buffalo, NY, 14260-4110, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, 9500 Gilman Dr., Mail Code: 0667, La Jolla, CA, 92093-0667, USA
- Institute for Genomic Medicine, University of California San Diego, 9500 Gilman Dr., Mail Code: 0667, La Jolla, CA, 92093-0667, USA
| | - Ruowang Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, USA.
| |
Collapse
|
3
|
Manduchi E, Romano JD, Moore JH. The promise of automated machine learning for the genetic analysis of complex traits. Hum Genet 2021; 141:1529-1544. [PMID: 34713318 PMCID: PMC9360157 DOI: 10.1007/s00439-021-02393-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 10/22/2021] [Indexed: 12/24/2022]
Abstract
The genetic analysis of complex traits has been dominated by parametric statistical methods due to their theoretical properties, ease of use, computational efficiency, and intuitive interpretation. However, there are likely to be patterns arising from complex genetic architectures which are more easily detected and modeled using machine learning methods. Unfortunately, selecting the right machine learning algorithm and tuning its hyperparameters can be daunting for experts and non-experts alike. The goal of automated machine learning (AutoML) is to let a computer algorithm identify the right algorithms and hyperparameters thus taking the guesswork out of the optimization process. We review the promises and challenges of AutoML for the genetic analysis of complex traits and give an overview of several approaches and some example applications to omics data. It is our hope that this review will motivate studies to develop and evaluate novel AutoML methods and software in the genetics and genomics space. The promise of AutoML is to enable anyone, regardless of training or expertise, to apply machine learning as part of their genetic analysis strategy.
Collapse
Affiliation(s)
- Elisabetta Manduchi
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Joseph D Romano
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA. .,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|