Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 2014;10:e1004754. [PMID: 25393026 PMCID: PMC4230844 DOI: 10.1371/journal.pgen.1004754] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

For:	Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 2014;10:e1004754. [PMID: 25393026 PMCID: PMC4230844 DOI: 10.1371/journal.pgen.1004754] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Number

Cited by Other Article(s)

Novielli P, Romano D, Pavan S, Losciale P, Stellacci AM, Diacono D, Bellotti R, Tangaro S. Explainable artificial intelligence for genotype-to-phenotype prediction in plant breeding: a case study with a dataset from an almond germplasm collection. FRONTIERS IN PLANT SCIENCE 2024;15:1434229. [PMID: 39319003 PMCID: PMC11420924 DOI: 10.3389/fpls.2024.1434229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 08/13/2024] [Indexed: 09/26/2024]

Abstract

Background

Advances in DNA sequencing revolutionized plant genomics and significantly contributed to the study of genetic diversity. However, predicting phenotypes from genomic data remains a challenge, particularly in the context of plant breeding. Despite significant progress, accurately predicting phenotypes from high-dimensional genomic data remains a challenge, particularly in identifying the key genetic factors influencing these predictions. This study aims to bridge this gap by integrating explainable artificial intelligence (XAI) techniques with advanced machine learning models. This approach is intended to enhance both the predictive accuracy and interpretability of genotype-to-phenotype models, thereby improving their reliability and supporting more informed breeding decisions.

Results

This study compares several ML methods for genotype-to-phenotype prediction, using data available from an almond germplasm collection. After preprocessing and feature selection, regression models are employed to predict almond shelling fraction. Best predictions were obtained by the Random Forest method (correlation = 0.727 ± 0.020, an R 2 = 0.511 ± 0.025, and an RMSE = 7.746 ± 0.199). Notably, the application of the SHAP (SHapley Additive exPlanations) values algorithm to explain the results highlighted several genomic regions associated with the trait, including one, having the highest feature importance, located in a gene potentially involved in seed development.

Conclusions

Employing explainable artificial intelligence algorithms enhances model interpretability, identifying genetic polymorphisms associated with the shelling percentage. These findings underscore XAI's efficacy in predicting phenotypic traits from genomic data, highlighting its significance in optimizing crop production for sustainable agriculture.

Collapse

Mohtasham F, Pourhoseingholi M, Hashemi Nazari SS, Kavousi K, Zali MR. Comparative analysis of feature selection techniques for COVID-19 dataset. Sci Rep 2024;14:18627. [PMID: 39128991 PMCID: PMC11317481 DOI: 10.1038/s41598-024-69209-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 08/01/2024] [Indexed: 08/13/2024] Open

Sztepanacz JL, Houle D. Regularized regression can improve estimates of multivariate selection in the face of multicollinearity and limited data. Evol Lett 2024;8:361-373. [PMID: 39211358 PMCID: PMC11358252 DOI: 10.1093/evlett/qrad064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 11/19/2023] [Accepted: 12/06/2023] [Indexed: 09/04/2024] Open

Abstract

The breeder's equation, Δ z ¯ = G β , allows us to understand how genetics (the genetic covariance matrix, G) and the vector of linear selection gradients β interact to generate evolutionary trajectories. Estimation of β using multiple regression of trait values on relative fitness revolutionized the way we study selection in laboratory and wild populations. However, multicollinearity, or correlation of predictors, can lead to very high variances of and covariances between elements of β, posing a challenge for the interpretation of the parameter estimates. This is particularly relevant in the era of big data, where the number of predictors may approach or exceed the number of observations. A common approach to multicollinear predictors is to discard some of them, thereby losing any information that might be gained from those traits. Using simulations, we show how, on the one hand, multicollinearity can result in inaccurate estimates of selection, and, on the other, how the removal of correlated phenotypes from the analyses can provide a misguided view of the targets of selection. We show that regularized regression, which places data-validated constraints on the magnitudes of individual elements of β, can produce more accurate estimates of the total strength and direction of multivariate selection in the presence of multicollinearity and limited data, and often has little cost when multicollinearity is low. We also compare standard and regularized regression estimates of selection in a reanalysis of three published case studies, showing that regularized regression can improve fitness predictions in independent data. Our results suggest that regularized regression is a valuable tool that can be used as an important complement to traditional least-squares estimates of selection. In some cases, its use can lead to improved predictions of individual fitness, and improved estimates of the total strength and direction of multivariate selection.

Collapse

Alfayyadh MM, Maksemous N, Sutherland HG, Lea RA, Griffiths LR. Unravelling the Genetic Landscape of Hemiplegic Migraine: Exploring Innovative Strategies and Emerging Approaches. Genes (Basel) 2024;15:443. [PMID: 38674378 PMCID: PMC11049430 DOI: 10.3390/genes15040443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open

Gu LL, Yang RQ, Wang ZY, Jiang D, Fang M. Ensemble learning for integrative prediction of genetic values with genomic variants. BMC Bioinformatics 2024;25:120. [PMID: 38515026 PMCID: PMC10956256 DOI: 10.1186/s12859-024-05720-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/26/2024] [Indexed: 03/23/2024] Open

Zhou W, Yan Z, Zhang L. A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction. Sci Rep 2024;14:5905. [PMID: 38467662 PMCID: PMC10928191 DOI: 10.1038/s41598-024-55243-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/21/2024] [Indexed: 03/13/2024] Open

Abstract

To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R2 (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.

Collapse

Jeng XJ, Hu Y, Venkat V, Lu TP, Tzeng JY. Transfer learning with false negative control improves polygenic risk prediction. PLoS Genet 2023;19:e1010597. [PMID: 38011285 PMCID: PMC10723713 DOI: 10.1371/journal.pgen.1010597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 12/15/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open

Lehmann B, Mackintosh M, McVean G, Holmes C. Optimal strategies for learning multi-ancestry polygenic scores vary across traits. Nat Commun 2023;14:4023. [PMID: 37419925 PMCID: PMC10328935 DOI: 10.1038/s41467-023-38930-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 05/22/2023] [Indexed: 07/09/2023] Open

Ko C, Brody JP. Evaluation of a genetic risk score computed using human chromosomal-scale length variation to predict breast cancer. Hum Genomics 2023;17:53. [PMID: 37328908 DOI: 10.1186/s40246-023-00482-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 03/30/2023] [Indexed: 06/18/2023] Open

Banerjee J, Taroni JN, Allaway RJ, Prasad DV, Guinney J, Greene C. Machine learning in rare disease. Nat Methods 2023:10.1038/s41592-023-01886-z. [PMID: 37248386 DOI: 10.1038/s41592-023-01886-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/22/2023] [Indexed: 05/31/2023]

Xu Y, Ritchie SC, Liang Y, Timmers PRHJ, Pietzner M, Lannelongue L, Lambert SA, Tahir UA, May-Wilson S, Foguet C, Johansson Å, Surendran P, Nath AP, Persyn E, Peters JE, Oliver-Williams C, Deng S, Prins B, Luan J, Bomba L, Soranzo N, Di Angelantonio E, Pirastu N, Tai ES, van Dam RM, Parkinson H, Davenport EE, Paul DS, Yau C, Gerszten RE, Mälarstig A, Danesh J, Sim X, Langenberg C, Wilson JF, Butterworth AS, Inouye M. An atlas of genetic scores to predict multi-omic traits. Nature 2023;616:123-131. [PMID: 36991119 PMCID: PMC10323211 DOI: 10.1038/s41586-023-05844-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 02/15/2023] [Indexed: 03/30/2023]

Affiliation(s)

Yu Xu Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK.
Scott C Ritchie Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
Yujian Liang Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
Paul R H J Timmers Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK
Maik Pietzner MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK Computational Medicine, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany Precision Healthcare University Research Institute, Queen Mary University of London, London, UK
Loïc Lannelongue Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
Samuel A Lambert Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
Usman A Tahir Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
Sebastian May-Wilson Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK
Carles Foguet Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
Åsa Johansson Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Praveen Surendran British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Artika P Nath Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
Elodie Persyn Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
James E Peters Department of Immunology and Inflammation, Faculty of Medicine, Imperial College London, London, UK
Clare Oliver-Williams British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Shuliang Deng Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
Bram Prins British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Jian'an Luan MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK
Lorenzo Bomba Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK BioMarin Pharmaceutical, Novato, CA, USA
Nicole Soranzo British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK Department of Haematology, University of Cambridge, Cambridge, UK NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Genomics Research Centre, Human Technopole, Milan, Italy
Emanuele Di Angelantonio British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Health Data Science Research Centre, Human Technopole, Milan, Italy
Nicola Pirastu Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK Genomics Research Centre, Human Technopole, Milan, Italy
E Shyong Tai Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore Department of Medicine, National University of Singapore and National University Health System, Singapore, Singapore
Rob M van Dam Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore Departments of Exercise and Nutrition Sciences and Epidemiology, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA
Helen Parkinson European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
Emma E Davenport Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
Dirk S Paul British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
Christopher Yau Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK Health Data Research UK, London, UK
Robert E Gerszten Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA Broad Institute of Harvard University and Massachusetts Institute of Technology, Cambridge, MA, USA
Anders Mälarstig Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden Pfizer Worldwide Research, Development and Medical, Stockholm, Sweden
John Danesh British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Xueling Sim Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
Claudia Langenberg MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK Computational Medicine, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany Precision Healthcare University Research Institute, Queen Mary University of London, London, UK
James F Wilson Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
Adam S Butterworth British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Michael Inouye Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK. British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK. Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK. Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia. The Alan Turing Institute, London, UK.

Collapse

Clemens B, Lefort-Besnard J, Ritter C, Smith E, Votinov M, Derntl B, Habel U, Bzdok D. Accurate machine learning prediction of sexual orientation based on brain morphology and intrinsic functional connectivity. Cereb Cortex 2023;33:4013-4025. [PMID: 36104854 PMCID: PMC10068286 DOI: 10.1093/cercor/bhac323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 07/20/2022] [Accepted: 07/21/2022] [Indexed: 11/13/2022] Open

Affiliation(s)

Benjamin Clemens Department of Psychiatry, Psychotherapy and Psychosomatics, Faculty of Medicine, RWTH Aachen, Pauwelsstr. 30, 52074 Aachen, Germany Research Center Jülich, Institute of Neuroscience and Medicine: JARA-Institute Brain Structure Function Relationship (INM 10), Wilhelm-Johnen-Strase, 52428 Jülich, Germany
Jeremy Lefort-Besnard Unicaen, Inserm, Comete, Gip Cyceron, 1400 Caen, Normandie, France
Christoph Ritter Interdisciplinary Center for Clinical Research (IZKF), RWTH Aachen University, Pauwelsstr. 30, 52074 Aachen, Germany
Elke Smith Biological Psychology, Department of Psychology, University of Cologne, Bernhard-Feilchenfeld-Str. 11, 50969 Cologne, Germany
Mikhail Votinov Department of Psychiatry, Psychotherapy and Psychosomatics, Faculty of Medicine, RWTH Aachen, Pauwelsstr. 30, 52074 Aachen, Germany Research Center Jülich, Institute of Neuroscience and Medicine: JARA-Institute Brain Structure Function Relationship (INM 10), Wilhelm-Johnen-Strase, 52428 Jülich, Germany
Birgit Derntl Department of Psychiatry and Psychotherapy, University of Tübingen, Calwerst. 14, 72076 Tübingen, Germany Werner Reichardt Center for Integrative Neuroscience (CIN), University of Tübingen, Otfried-Müller-Str. 25, 72076 Tübingen, Germany
Ute Habel Department of Psychiatry, Psychotherapy and Psychosomatics, Faculty of Medicine, RWTH Aachen, Pauwelsstr. 30, 52074 Aachen, Germany Research Center Jülich, Institute of Neuroscience and Medicine: JARA-Institute Brain Structure Function Relationship (INM 10), Wilhelm-Johnen-Strase, 52428 Jülich, Germany
Danilo Bzdok McConnell Brain Imaging Centre, McGill University, 3801 University Rue, Montreal Quebec H3A 2B4, Canada Department of Biomedical Engineering, McGill University, 3775 University Rue, Montreal Quebec H3A 2B4, Canada Faculty of Medicine, Montreal Neurological Institute (MNI) and Hospital, McGill University, 3801 University Rue, Montreal Quebec H3A 2B4, Canada Mila–Quebec Artificial Intelligence Institute, 6666 Rue St-Urbain #200, Montreal Quebec H2S 3H1, Canada

Collapse

Kang G, Baek SH, Kim YH, Kim DH, Park JW. Genetic Risk Assessment of Nonsyndromic Cleft Lip with or without Cleft Palate by Linking Genetic Networks and Deep Learning Models. Int J Mol Sci 2023;24:ijms24054557. [PMID: 36901988 PMCID: PMC10003462 DOI: 10.3390/ijms24054557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 02/13/2023] [Accepted: 02/20/2023] [Indexed: 03/02/2023] Open

Learning high-order interactions for polygenic risk prediction. PLoS One 2023;18:e0281618. [PMID: 36763605 PMCID: PMC9916647 DOI: 10.1371/journal.pone.0281618] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 01/27/2023] [Indexed: 02/11/2023] Open

Abstract

Within the framework of precision medicine, the stratification of individual genetic susceptibility based on inherited DNA variation has paramount relevance. However, one of the most relevant pitfalls of traditional Polygenic Risk Scores (PRS) approaches is their inability to model complex high-order non-linear SNP-SNP interactions and their effect on the phenotype (e.g. epistasis). Indeed, they incur in a computational challenge as the number of possible interactions grows exponentially with the number of SNPs considered, affecting the statistical reliability of the model parameters as well. In this work, we address this issue by proposing a novel PRS approach, called High-order Interactions-aware Polygenic Risk Score (hiPRS), that incorporates high-order interactions in modeling polygenic risk. The latter combines an interaction search routine based on frequent itemsets mining and a novel interaction selection algorithm based on Mutual Information, to construct a simple and interpretable weighted model of user-specified dimensionality that can predict a given binary phenotype. Compared to traditional PRSs methods, hiPRS does not rely on GWAS summary statistics nor any external information. Moreover, hiPRS differs from Machine Learning-based approaches that can include complex interactions in that it provides a readable and interpretable model and it is able to control overfitting, even on small samples. In the present work we demonstrate through a comprehensive simulation study the superior performance of hiPRS w.r.t. state of the art methods, both in terms of scoring performance and interpretability of the resulting model. We also test hiPRS against small sample size, class imbalance and the presence of noise, showcasing its robustness to extreme experimental settings. Finally, we apply hiPRS to a case study on real data from DACHS cohort, defining an interaction-aware scoring model to predict mortality of stage II-III Colon-Rectal Cancer patients treated with oxaliplatin.

Collapse

Spanbauer C, Pan W. Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles. Genet Epidemiol 2023;47:26-44. [PMID: 36349692 PMCID: PMC9892284 DOI: 10.1002/gepi.22505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/08/2022] [Accepted: 09/21/2022] [Indexed: 11/11/2022]

Gerussi A, Scaravaglio M, Cristoferi L, Verda D, Milani C, De Bernardi E, Ippolito D, Asselta R, Invernizzi P, Kather JN, Carbone M. Artificial intelligence for precision medicine in autoimmune liver disease. Front Immunol 2022;13:966329. [PMID: 36439097 PMCID: PMC9691668 DOI: 10.3389/fimmu.2022.966329] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/13/2022] [Indexed: 09/10/2023] Open

Affiliation(s)

Alessio Gerussi Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
Miki Scaravaglio Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
Laura Cristoferi Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy Bicocca Bioinformatics Biostatistics and Bioimaging Centre - B4, School of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
Damiano Verda Rulex Inc., Newton, MA, United States
Chiara Milani Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
Elisabetta De Bernardi Department of Medicine and Surgery and Tecnomed Foundation, University of Milano - Bicocca, Monza, Italy
Davide Ippolito Department of Radiology, San Gerardo Hospital, Monza, Italy
Rosanna Asselta Humanitas Clinical and Research Center, Rozzano, Milan, Italy Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
Pietro Invernizzi Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
Jakob Nikolas Kather Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany
Marco Carbone Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy

Collapse

Ayat M, Domaratzki M. Sparse bayesian learning for genomic selection in yeast. FRONTIERS IN BIOINFORMATICS 2022;2:960889. [PMID: 36304259 PMCID: PMC9580947 DOI: 10.3389/fbinf.2022.960889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 08/02/2022] [Indexed: 11/13/2022] Open

Pudjihartono N, Fadason T, Kempa-Liehr AW, O'Sullivan JM. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. FRONTIERS IN BIOINFORMATICS 2022;2:927312. [PMID: 36304293 PMCID: PMC9580915 DOI: 10.3389/fbinf.2022.927312] [Citation(s) in RCA: 75] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 06/03/2022] [Indexed: 01/14/2023] Open

Ruigrok M, Xue B, Catanach A, Zhang M, Jesson L, Davy M, Wellenreuther M. The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus. Genes (Basel) 2022;13:genes13071129. [PMID: 35885912 PMCID: PMC9320665 DOI: 10.3390/genes13071129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/08/2022] [Accepted: 06/20/2022] [Indexed: 02/04/2023] Open

Abstract

Background: Genetic diversity provides the basic substrate for evolution. Genetic variation consists of changes ranging from single base pairs (single-nucleotide polymorphisms, or SNPs) to larger-scale structural variants, such as inversions, deletions, and duplications. SNPs have long been used as the general currency for investigations into how genetic diversity fuels evolution. However, structural variants can affect more base pairs in the genome than SNPs and can be responsible for adaptive phenotypes due to their impact on linkage and recombination. In this study, we investigate the first steps needed to explore the genetic basis of an economically important growth trait in the marine teleost finfish Chrysophrys auratus using both SNP and structural variant data. Specifically, we use feature selection methods in machine learning to explore the relative predictive power of both types of genetic variants in explaining growth and discuss the feature selection results of the evaluated methods. Methods: SNP and structural variant callers were used to generate catalogues of variant data from 32 individual fish at ages 1 and 3 years. Three feature selection algorithms (ReliefF, Chi-square, and a mutual-information-based method) were used to reduce the dataset by selecting the most informative features. Following this selection process, the subset of variants was used as features to classify fish into small, medium, or large size categories using KNN, naïve Bayes, random forest, and logistic regression. The top-scoring features in each feature selection method were subsequently mapped to annotated genomic regions in the zebrafish genome, and a permutation test was conducted to see if the number of mapped regions was greater than when random sampling was applied. Results: Without feature selection, the prediction accuracies ranged from 0 to 0.5 for both structural variants and SNPs. Following feature selection, the prediction accuracy increased only slightly to between 0 and 0.65 for structural variants and between 0 and 0.75 for SNPs. The highest prediction accuracy for the logistic regression was achieved for age 3 fish using SNPs, although generally predictions for age 1 and 3 fish were very similar (ranging from 0–0.65 for both SNPs and structural variants). The Chi-square feature selection of SNP data was the only method that had a significantly higher number of matches to annotated genomic regions of zebrafish than would be explained by chance alone. Conclusions: Predicting a complex polygenic trait such as growth using data collected from a low number of individuals remains challenging. While we demonstrate that both SNPs and structural variants provide important information to help understand the genetic basis of phenotypic traits such as fish growth, the full complexities that exist within a genome cannot be easily captured by classical machine learning techniques. When using high-dimensional data, feature selection shows some increase in the prediction accuracy of classification models and provides the potential to identify unknown genomic correlates with growth. Our results show that both SNPs and structural variants significantly impact growth, and we therefore recommend that researchers interested in the genotype–phenotype map should strive to go beyond SNPs and incorporate structural variants in their studies as well. We discuss how our machine learning models can be further expanded to serve as a test bed to inform evolutionary studies and the applied management of species.

Collapse

Isik YE, Gormez Y, Aydin Z, Bakir-Gungor B. The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behçet's Disease. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:1909-1918. [PMID: 33476272 DOI: 10.1109/tcbb.2021.3053429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Yoo HY, Lee KC, Woo JE, Park SH, Lee S, Joo J, Bae JS, Kwon HJ, Park BJ. A Genome-Wide Association Study and Machine-Learning Algorithm Analysis on the Prediction of Facial Phenotypes by Genotypes in Korean Women. Clin Cosmet Investig Dermatol 2022;15:433-445. [PMID: 35313536 PMCID: PMC8933694 DOI: 10.2147/ccid.s339547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/12/2022] [Indexed: 12/03/2022]

Abstract

Purpose

Changes in facial appearance are affected by various intrinsic and extrinsic factors, which vary from person to person. Therefore, each person needs to determine their skin condition accurately to care for their skin accordingly. Recently, genetic identification by skin-related phenotypes has become possible using genome-wide association studies (GWAS) and machine-learning algorithms. However, because most GWAS have focused on populations with American or European skin pigmentation, large-scale GWAS are needed for Asian populations. This study aimed to evaluate the correlation of facial phenotypes with candidate single-nucleotide polymorphisms (SNPs) to predict phenotype from genotype using machine learning.

Materials and Methods

A total of 749 Korean women aged 30-50 years were enrolled in this study and evaluated for five facial phenotypes (melanin, gloss, hydration, wrinkle, and elasticity). To find highly related SNPs with each phenotype, GWAS analysis was used. In addition, phenotype prediction was performed using three machine-learning algorithms (linear, ridge, and linear support vector regressions) using five-fold cross-validation.

Results

Using GWAS analysis, we found 46 novel highly associated SNPs (p < 1×10-05): 3, 20, 12, 6, and 5 SNPs for melanin, gloss, hydration, wrinkle, and elasticity, respectively. On comparing the performance of each model based on phenotypes using five-fold cross-validation, the ridge regression model showed the highest accuracy (r2 = 0.6422-0.7266) in all skin traits. Therefore, the optimal solution for personal skin diagnosis using GWAS was with the ridge regression model.

Conclusion

The proposed facial phenotype prediction model in this study provided the optimal solution for accurately predicting the skin condition of an individual by identifying genotype information of target characteristics and machine-learning methods. This model has potential utility for the development of customized cosmetics.

Collapse

Collin CB, Gebhardt T, Golebiewski M, Karaderi T, Hillemanns M, Khan FM, Salehzadeh-Yazdi A, Kirschner M, Krobitsch S, Kuepfer L. Computational Models for Clinical Applications in Personalized Medicine—Guidelines and Recommendations for Data Integration and Model Validation. J Pers Med 2022;12:jpm12020166. [PMID: 35207655 PMCID: PMC8879572 DOI: 10.3390/jpm12020166] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/14/2022] [Accepted: 01/20/2022] [Indexed: 12/12/2022] Open

Xu Y, Vuckovic D, Ritchie SC, Akbari P, Jiang T, Grealey J, Butterworth AS, Ouwehand WH, Roberts DJ, Di Angelantonio E, Danesh J, Soranzo N, Inouye M. Machine learning optimized polygenic scores for blood cell traits identify sex-specific trajectories and genetic correlations with disease. CELL GENOMICS 2022;2:None. [PMID: 35072137 PMCID: PMC8758502 DOI: 10.1016/j.xgen.2021.100086] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 08/24/2021] [Accepted: 12/13/2021] [Indexed: 12/13/2022]

Affiliation(s)

Yu Xu Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
Dragana Vuckovic Department of Human Genetics, Wellcome Sanger Institute, Hinxton CB10 1SA, UK National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge CB1 8RN, UK
Scott C. Ritchie Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB1 8RN, UK
Parsa Akbari British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge CB1 8RN, UK
Tao Jiang British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
Jason Grealey Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia Department of Mathematics and Statistics, La Trobe University, Bundoora, VIC 3086, Australia
Adam S. Butterworth British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge CB1 8RN, UK British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB1 8RN, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge CB10 1SA, UK
Willem H. Ouwehand Department of Human Genetics, Wellcome Sanger Institute, Hinxton CB10 1SA, UK British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB1 8RN, UK National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK Department of Haematology, University of Cambridge, Cambridge CB2 0PT, UK
David J. Roberts National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge CB1 8RN, UK National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK National Institute for Health Research Oxford Biomedical Research Centre, University of Oxford and John Radcliffe Hospital, Oxford OX3 9DU, UK
Emanuele Di Angelantonio British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge CB1 8RN, UK British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB1 8RN, UK Health Data Science Research Centre, Human Technopole, Milan 20157, Italy Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge CB10 1SA, UK
John Danesh British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK Department of Human Genetics, Wellcome Sanger Institute, Hinxton CB10 1SA, UK National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge CB1 8RN, UK British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB1 8RN, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge CB10 1SA, UK
Nicole Soranzo Department of Human Genetics, Wellcome Sanger Institute, Hinxton CB10 1SA, UK National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge CB1 8RN, UK British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB1 8RN, UK
Michael Inouye Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB1 8RN, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge CB10 1SA, UK The Alan Turing Institute, London NW1 2DB, UK

Collapse

Raben TG, Lello L, Widen E, Hsu SDH. From Genotype to Phenotype: Polygenic Prediction of Complex Human Traits. Methods Mol Biol 2022;2467:421-446. [PMID: 35451785 DOI: 10.1007/978-1-0716-2205-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Passamonti MM, Somenzi E, Barbato M, Chillemi G, Colli L, Joost S, Milanesi M, Negrini R, Santini M, Vajana E, Williams JL, Ajmone-Marsan P. The Quest for Genes Involved in Adaptation to Climate Change in Ruminant Livestock. Animals (Basel) 2021;11:2833. [PMID: 34679854 PMCID: PMC8532622 DOI: 10.3390/ani11102833] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/21/2021] [Accepted: 09/23/2021] [Indexed: 12/14/2022] Open

Affiliation(s)

Matilde Maria Passamonti Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
Elisa Somenzi Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
Mario Barbato Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
Giovanni Chillemi Department for Innovation in Biological, Agro-Food and Forest Systems–DIBAF, Università Della Tuscia, Via S. Camillo de Lellis snc, 01100 Viterbo, Italy; (G.C.); (M.M.)
Licia Colli Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.) Research Center on Biodiversity and Ancient DNA—BioDNA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy
Stéphane Joost Laboratory of Geographic Information Systems (LASIG), School of Architecture, Civil and Environmental Engineering (ENAC), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; (S.J.); (E.V.)
Marco Milanesi Department for Innovation in Biological, Agro-Food and Forest Systems–DIBAF, Università Della Tuscia, Via S. Camillo de Lellis snc, 01100 Viterbo, Italy; (G.C.); (M.M.)
Riccardo Negrini Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
Monia Santini Impacts on Agriculture, Forests and Ecosystem Services (IAFES) Division, Fondazione Centro Euro-Mediterraneo Sui Cambiamenti Climatici (CMCC), Viale Trieste 127, 01100 Viterbo, Italy;
Elia Vajana Laboratory of Geographic Information Systems (LASIG), School of Architecture, Civil and Environmental Engineering (ENAC), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; (S.J.); (E.V.)
John Lewis Williams Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
Paolo Ajmone-Marsan Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.) Nutrigenomics and Proteomics Research Center—PRONUTRIGEN, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy

Collapse

Katsaouni N, Tashkandi A, Wiese L, Schulz MH. Machine learning based disease prediction from genotype data. Biol Chem 2021;402:871-885. [PMID: 34218544 DOI: 10.1515/hsz-2021-0109] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 06/15/2021] [Indexed: 12/16/2022]

Mieth B, Rozier A, Rodriguez JA, Höhne MMC, Görnitz N, Müller KR. DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies. NAR Genom Bioinform 2021;3:lqab065. [PMID: 34296082 PMCID: PMC8291080 DOI: 10.1093/nargab/lqab065] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 05/27/2021] [Accepted: 07/08/2021] [Indexed: 02/06/2023] Open

Westerman EL, Bowman SEJ, Davidson B, Davis MC, Larson ER, Sanford CPJ. Deploying Big Data to Crack the Genotype to Phenotype Code. Integr Comp Biol 2021;60:385-396. [PMID: 32492136 DOI: 10.1093/icb/icaa055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Bauer A, Zierer A, Gieger C, Büyüközkan M, Müller-Nurasyid M, Grallert H, Meisinger C, Strauch K, Prokisch H, Roden M, Peters A, Krumsiek J, Herder C, Koenig W, Thorand B, Huth C. Comparison of genetic risk prediction models to improve prediction of coronary heart disease in two large cohorts of the MONICA/KORA study. Genet Epidemiol 2021;45:633-650. [PMID: 34082474 DOI: 10.1002/gepi.22389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 04/20/2021] [Accepted: 05/04/2021] [Indexed: 12/19/2022]

Affiliation(s)

Alina Bauer Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Astrid Zierer Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Christian Gieger Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Mustafa Büyüközkan Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, USA
Martina Müller-Nurasyid Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
Harald Grallert Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Christa Meisinger German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany.,Chair of Epidemiology, LMU Munich, UNIKA-T Augsburg, Augsburg, Germany.,Independent Research Group Clinical Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Konstantin Strauch Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany
Holger Prokisch Institute of Human Genetics, School of Medicine, Technische Universität München, München, Germany.,Institute of Neurogenomics, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Michael Roden Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,German Center for Diabetes Research (DZD), Partner Düsseldorf, München-Neuherberg, Germany
Annette Peters Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany.,Institute of Epidemiology and Medical Biometry, University of Ulm, Ulm, Germany
Jan Krumsiek Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, USA
Christian Herder Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,German Center for Diabetes Research (DZD), Partner Düsseldorf, München-Neuherberg, Germany
Wolfgang Koenig Institute of Epidemiology and Medical Biometry, University of Ulm, Ulm, Germany.,Deutsches Herzzentrum München, Technische Universität München, Munich, Germany.,German Centre for Cardiovascular Research (DZHK), partner site Munich Heart Alliance, Munich, Germany
Barbara Thorand Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany
Cornelia Huth Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany

Collapse

Phenotypical predictors of pregnancy-related restless legs syndrome and their association with basal ganglia and the limbic circuits. Sci Rep 2021;11:9996. [PMID: 33976261 PMCID: PMC8113250 DOI: 10.1038/s41598-021-89360-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 04/23/2021] [Indexed: 11/21/2022] Open

Varma M, Paskov KM, Chrisman BS, Sun MW, Jung JY, Stockham NT, Washington PY, Wall DP. A maximum flow-based network approach for identification of stable noncoding biomarkers associated with the multigenic neurological condition, autism. BioData Min 2021;14:28. [PMID: 33941233 PMCID: PMC8091705 DOI: 10.1186/s13040-021-00262-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 04/20/2021] [Indexed: 12/05/2022] Open

Prediction of atherosclerosis diseases using biosensor-assisted deep learning artificial neuron model. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05317-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Tozzo V, Azencott CA, Fiorini S, Fava E, Trucco A, Barla A. Where Do We Stand in Regularization for Life Science Studies? J Comput Biol 2021;29:213-232. [PMID: 33926217 PMCID: PMC8968832 DOI: 10.1089/cmb.2019.0371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Manavalan R, Priya S. Genetic interactions effects for cancer disease identification using computational models: a review. Med Biol Eng Comput 2021;59:733-758. [PMID: 33839998 DOI: 10.1007/s11517-021-02343-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 03/10/2021] [Indexed: 11/29/2022]

Bracher-Smith M, Crawford K, Escott-Price V. Machine learning for genetic prediction of psychiatric disorders: a systematic review. Mol Psychiatry 2021;26:70-79. [PMID: 32591634 PMCID: PMC7610853 DOI: 10.1038/s41380-020-0825-2] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 06/09/2020] [Accepted: 06/16/2020] [Indexed: 12/25/2022]

Abstract

Machine learning methods have been employed to make predictions in psychiatry from genotypes, with the potential to bring improved prediction of outcomes in psychiatric genetics; however, their current performance is unclear. We aim to systematically review machine learning methods for predicting psychiatric disorders from genetics alone and evaluate their discrimination, bias and implementation. Medline, PsycInfo, Web of Science and Scopus were searched for terms relating to genetics, psychiatric disorders and machine learning, including neural networks, random forests, support vector machines and boosting, on 10 September 2019. Following PRISMA guidelines, articles were screened for inclusion independently by two authors, extracted, and assessed for risk of bias. Overall, 63 full texts were assessed from a pool of 652 abstracts. Data were extracted for 77 models of schizophrenia, bipolar, autism or anorexia across 13 studies. Performance of machine learning methods was highly varied (0.48-0.95 AUC) and differed between schizophrenia (0.54-0.95 AUC), bipolar (0.48-0.65 AUC), autism (0.52-0.81 AUC) and anorexia (0.62-0.69 AUC). This is likely due to the high risk of bias identified in the study designs and analysis for reported results. Choices for predictor selection, hyperparameter search and validation methodology, and viewing of the test set during training were common causes of high risk of bias in analysis. Key steps in model development and validation were frequently not performed or unreported. Comparison of discrimination across studies was constrained by heterogeneity of predictors, outcome and measurement, in addition to sample overlap within and across studies. Given widespread high risk of bias and the small number of studies identified, it is important to ensure established analysis methods are adopted. We emphasise best practices in methodology and reporting for improving future studies.

Collapse

Zou K, Kim KS, Kim K, Kang D, Park YH, Sun H, Ha BK, Ha J, Jun TH. Genetic Diversity and Genome-Wide Association Study of Seed Aspect Ratio Using a High-Density SNP Array in Peanut (Arachis hypogaea L.). Genes (Basel) 2020;12:E2. [PMID: 33375051 PMCID: PMC7822046 DOI: 10.3390/genes12010002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 12/09/2020] [Accepted: 12/17/2020] [Indexed: 12/12/2022] Open

Digital Phenotyping Using Multimodal Data. Curr Behav Neurosci Rep 2020. [DOI: 10.1007/s40473-020-00215-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Kang J, Coates JT, Strawderman RL, Rosenstein BS, Kerns SL. Genomics models in radiotherapy: From mechanistic to machine learning. Med Phys 2020;47:e203-e217. [PMID: 32418335 PMCID: PMC8725063 DOI: 10.1002/mp.13751] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 06/28/2019] [Accepted: 07/17/2019] [Indexed: 12/28/2022] Open

Padilla-Martínez F, Collin F, Kwasniewski M, Kretowski A. Systematic Review of Polygenic Risk Scores for Type 1 and Type 2 Diabetes. Int J Mol Sci 2020;21:E1703. [PMID: 32131491 PMCID: PMC7084489 DOI: 10.3390/ijms21051703] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 02/28/2020] [Accepted: 02/28/2020] [Indexed: 02/07/2023] Open

Waldmann P, Pfeiffer C, Mészáros G. Sparse Convolutional Neural Networks for Genome-Wide Prediction. Front Genet 2020;11:25. [PMID: 32117441 PMCID: PMC7029737 DOI: 10.3389/fgene.2020.00025] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 01/08/2020] [Indexed: 12/03/2022] Open

Lello L, Raben TG, Yong SY, Tellier LCAM, Hsu SDH. Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer. Sci Rep 2019;9:15286. [PMID: 31653892 PMCID: PMC6814833 DOI: 10.1038/s41598-019-51258-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 09/26/2019] [Indexed: 01/09/2023] Open

Grinberg NF, Orhobor OI, King RD. An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat. Mach Learn 2019;109:251-277. [PMID: 32174648 PMCID: PMC7048706 DOI: 10.1007/s10994-019-05848-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Revised: 09/17/2019] [Accepted: 09/19/2019] [Indexed: 11/01/2022]

Abstract

In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.

Collapse

Abd El Hamid MM, Mabrouk MS, Omar YMK. DEVELOPING AN EARLY PREDICTIVE SYSTEM FOR IDENTIFYING GENETIC BIOMARKERS ASSOCIATED TO ALZHEIMER’S DISEASE USING MACHINE LEARNING TECHNIQUES. BIOMEDICAL ENGINEERING: APPLICATIONS, BASIS AND COMMUNICATIONS 2019. [DOI: 10.4015/s1016237219500406] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Analysis of polygenic risk score usage and performance in diverse human populations. Nat Commun 2019;10:3328. [PMID: 31346163 PMCID: PMC6658471 DOI: 10.1038/s41467-019-11112-0] [Citation(s) in RCA: 543] [Impact Index Per Article: 108.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 06/18/2019] [Indexed: 12/11/2022] Open

Romagnoni A, Jégou S, Van Steen K, Wainrib G, Hugot JP. Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data. Sci Rep 2019;9:10351. [PMID: 31316157 PMCID: PMC6637191 DOI: 10.1038/s41598-019-46649-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 07/03/2019] [Indexed: 02/08/2023] Open

Stephenson M, Darlington GA, Schenkel FS, Squires EJ, Ali RA. DSRIG: Incorporating graphical structure in the regularized modeling of SNP data. J Bioinform Comput Biol 2019;17:1950017. [PMID: 31288640 DOI: 10.1142/s0219720019500173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Mwanga EP, Mapua SA, Siria DJ, Ngowo HS, Nangacha F, Mgando J, Baldini F, González Jiménez M, Ferguson HM, Wynne K, Selvaraj P, Babayan SA, Okumu FO. Using mid-infrared spectroscopy and supervised machine-learning to identify vertebrate blood meals in the malaria vector, Anopheles arabiensis. Malar J 2019;18:187. [PMID: 31146762 PMCID: PMC6543689 DOI: 10.1186/s12936-019-2822-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 05/25/2019] [Indexed: 02/03/2023] Open

Abstract

BACKGROUND

The propensity of different Anopheles mosquitoes to bite humans instead of other vertebrates influences their capacity to transmit pathogens to humans. Unfortunately, determining proportions of mosquitoes that have fed on humans, i.e. Human Blood Index (HBI), currently requires expensive and time-consuming laboratory procedures involving enzyme-linked immunosorbent assays (ELISA) or polymerase chain reactions (PCR). Here, mid-infrared (MIR) spectroscopy and supervised machine learning are used to accurately distinguish between vertebrate blood meals in guts of malaria mosquitoes, without any molecular techniques.

METHODS

Laboratory-reared Anopheles arabiensis females were fed on humans, chickens, goats or bovines, then held for 6 to 8 h, after which they were killed and preserved in silica. The sample size was 2000 mosquitoes (500 per host species). Five individuals of each host species were enrolled to ensure genotype variability, and 100 mosquitoes fed on each. Dried mosquito abdomens were individually scanned using attenuated total reflection-Fourier transform infrared (ATR-FTIR) spectrometer to obtain high-resolution MIR spectra (4000 cm-1 to 400 cm-1). The spectral data were cleaned to compensate atmospheric water and CO2 interference bands using Bruker-OPUS software, then transferred to Python™ for supervised machine-learning to predict host species. Seven classification algorithms were trained using 90% of the spectra through several combinations of 75-25% data splits. The best performing model was used to predict identities of the remaining 10% validation spectra, which had not been used for model training or testing.

RESULTS

The logistic regression (LR) model achieved the highest accuracy, correctly predicting true vertebrate blood meal sources with overall accuracy of 98.4%. The model correctly identified 96% goat blood meals, 97% of bovine blood meals, 100% of chicken blood meals and 100% of human blood meals. Three percent of bovine blood meals were misclassified as goat, and 2% of goat blood meals misclassified as human.

CONCLUSION

Mid-infrared spectroscopy coupled with supervised machine learning can accurately identify multiple vertebrate blood meals in malaria vectors, thus potentially enabling rapid assessment of mosquito blood-feeding histories and vectorial capacities. The technique is cost-effective, fast, simple, and requires no reagents other than desiccants. However, scaling it up will require field validation of the findings and boosting relevant technical capacity in affected countries.

Collapse

Jackknife Model Averaging Prediction Methods for Complex Phenotypes with Gene Expression Levels by Integrating External Pathway Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019;2019:2807470. [PMID: 31089389 PMCID: PMC6476151 DOI: 10.1155/2019/2807470] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 03/20/2019] [Indexed: 01/03/2023]

Abstract

Motivation

In the past few years many prediction approaches have been proposed and widely employed in high dimensional genetic data for disease risk evaluation. However, those approaches typically ignore in model fitting the important group structures that naturally exists in genetic data.

Methods

In the present study, we applied a novel model-averaging approach, called jackknife model averaging prediction (JMAP), for high dimensional genetic risk prediction while incorporating pathway information into the model specification. JMAP selects the optimal weights across candidate models by minimizing a cross validation criterion in a jackknife way. Compared with previous approaches, one of the primary features of JMAP is to allow model weights to vary from 0 to 1 but without the limitation that the summation of weights is equal to one. We evaluated the performance of JMAP using extensive simulation studies and compared it with existing methods. We finally applied JMAP to four real cancer datasets that are publicly available from TCGA.

Results

The simulations showed that compared with other existing approaches (e.g., gsslasso), JMAP performed best or is among the best methods across a range of scenarios. For example, among 14 out of 16 simulation settings with PVE = 0.3, JMAP has an average of 0.075 higher prediction accuracy compared with gsslasso. We further found that in the simulation, the model weights for the true candidate models have much smaller chances to be zero compared with those for the null candidate models and are substantially greater in magnitude. In the real data application, JMAP also behaves comparably or better compared with the other methods for continuous phenotypes. For example, for the COAD, CRC, and PAAD datasets, the average gains of predictive accuracy of JMAP are 0.019, 0.064, and 0.052 compared with gsslasso.

Conclusion

The proposed method JMAP is a novel model-averaging approach for high dimensional genetic risk prediction while incorporating external useful group structures into the model specification.

Collapse

Waldmann P, Ferenčaković M, Mészáros G, Khayatzadeh N, Curik I, Sölkner J. AUTALASSO: an automatic adaptive LASSO for genome-wide prediction. BMC Bioinformatics 2019;20:167. [PMID: 30940067 PMCID: PMC6444607 DOI: 10.1186/s12859-019-2743-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 03/18/2019] [Indexed: 01/30/2023] Open

Park YJ, Bae JH, Shin MH, Hyun SH, Cho YS, Choe YS, Choi JY, Lee KH, Kim BT, Moon SH. Development of Predictive Models in Patients with Epiphora Using Lacrimal Scintigraphy and Machine Learning. Nucl Med Mol Imaging 2019;53:125-135. [PMID: 31057684 PMCID: PMC6473022 DOI: 10.1007/s13139-019-00574-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 09/19/2018] [Accepted: 01/07/2019] [Indexed: 12/22/2022] Open