Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Boulesteix AL, Wright MN, Hoffmann S, König IR. Statistical learning approaches in the genetic epidemiology of complex diseases. Hum Genet 2019;139:73-84. [PMID: 31049651 DOI: 10.1007/s00439-019-01996-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 03/04/2019] [Indexed: 02/07/2023]

For:	Boulesteix AL, Wright MN, Hoffmann S, König IR. Statistical learning approaches in the genetic epidemiology of complex diseases. Hum Genet 2019;139:73-84. [PMID: 31049651 DOI: 10.1007/s00439-019-01996-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 03/04/2019] [Indexed: 02/07/2023]

Number

Cited by Other Article(s)

Ferrario PG, Gedrich K. Machine learning and personalized nutrition: a promising liaison? Eur J Clin Nutr 2024;78:74-76. [PMID: 37833568 PMCID: PMC10774117 DOI: 10.1038/s41430-023-01350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 09/12/2023] [Accepted: 09/20/2023] [Indexed: 10/15/2023]

Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023;21:182. [PMID: 37189125 DOI: 10.1186/s12916-023-02858-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open

Abstract

BACKGROUND

In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions.

METHODS

Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD.

RESULTS

The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided.

CONCLUSIONS

This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.

Collapse

Ezugwu AE, Oyelade ON, Ikotun AM, Agushaka JO, Ho YS. Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2023;30:1-31. [PMID: 37359741 PMCID: PMC10148585 DOI: 10.1007/s11831-023-09930-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/19/2023] [Indexed: 06/28/2023]

Molnar C, König G, Bischl B, Casalicchio G. Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach. Data Min Knowl Discov 2023. [DOI: 10.1007/s10618-022-00901-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Lam M, Chen CY, Hill WD, Xia C, Tian R, Levey DF, Gelernter J, Stein MB, Hatoum AS, Huang H, Malhotra AK, Runz H, Ge T, Lencz T. Collective genomic segments with differential pleiotropic patterns between cognitive dimensions and psychopathology. Nat Commun 2022;13:6868. [PMID: 36369282 PMCID: PMC9652380 DOI: 10.1038/s41467-022-34418-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 10/24/2022] [Indexed: 11/13/2022] Open

Affiliation(s)

Max Lam Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell, Glen Oaks, NY, USA Institute of Behavioral Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA Institute of Mental Health, Singapore, Singapore
Chia-Yen Chen Translational Biology, Research and Development, Biogen Inc, Cambridge, MA, USA
W David Hill Lothian Birth Cohorts group, Department of Psychology, University of Edinburgh, Edinburgh, UK
Charley Xia Lothian Birth Cohorts group, Department of Psychology, University of Edinburgh, Edinburgh, UK
Ruoyu Tian Computational Biology and Human Genetics, Dewpoint Therapeutics, Boston, MA, USA
Daniel F Levey Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA VA Connecticut Healthcare System, West Haven, CT, USA
Joel Gelernter Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA VA Connecticut Healthcare System, West Haven, CT, USA Department of Genetics, Yale University School of Medicine, New Haven, CT, USA Department of Neuroscience, Yale University School of Medicine, New Haven, CT, USA
Murray B Stein VA San Diego Healthcare System, San Diego, CA, USA Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
Alexander S Hatoum Department of Psychiatry, Washington University in St. Louis Medical School, St. Louis, MO, USA
Hailiang Huang Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
Anil K Malhotra Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell, Glen Oaks, NY, USA Institute of Behavioral Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA Department of Psychiatry, Zucker School of Medicine at Hofstra/Norwell, Hempstead, NY, USA Department of Molecular Medicine, Zucker School of Medicine at Hofstra/Norwell, Hempstead, NY, USA
Heiko Runz Translational Biology, Research and Development, Biogen Inc, Cambridge, MA, USA
Tian Ge Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Todd Lencz Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell, Glen Oaks, NY, USA. Institute of Behavioral Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA. Department of Psychiatry, Zucker School of Medicine at Hofstra/Norwell, Hempstead, NY, USA. Department of Molecular Medicine, Zucker School of Medicine at Hofstra/Norwell, Hempstead, NY, USA.

Collapse

Machine learning-based genetic diagnosis models for hereditary hearing loss by the GJB2, SLC26A4 and MT-RNR1 variants. EBioMedicine 2021;69:103322. [PMID: 34161886 PMCID: PMC8237285 DOI: 10.1016/j.ebiom.2021.103322] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/16/2022] Open

Bauer A, Zierer A, Gieger C, Büyüközkan M, Müller-Nurasyid M, Grallert H, Meisinger C, Strauch K, Prokisch H, Roden M, Peters A, Krumsiek J, Herder C, Koenig W, Thorand B, Huth C. Comparison of genetic risk prediction models to improve prediction of coronary heart disease in two large cohorts of the MONICA/KORA study. Genet Epidemiol 2021;45:633-650. [PMID: 34082474 DOI: 10.1002/gepi.22389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 04/20/2021] [Accepted: 05/04/2021] [Indexed: 12/19/2022]

Affiliation(s)

Alina Bauer Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Astrid Zierer Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Christian Gieger Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Mustafa Büyüközkan Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, USA
Martina Müller-Nurasyid Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
Harald Grallert Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Christa Meisinger German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany.,Chair of Epidemiology, LMU Munich, UNIKA-T Augsburg, Augsburg, Germany.,Independent Research Group Clinical Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Konstantin Strauch Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany
Holger Prokisch Institute of Human Genetics, School of Medicine, Technische Universität München, München, Germany.,Institute of Neurogenomics, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Michael Roden Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,German Center for Diabetes Research (DZD), Partner Düsseldorf, München-Neuherberg, Germany
Annette Peters Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany.,Institute of Epidemiology and Medical Biometry, University of Ulm, Ulm, Germany
Jan Krumsiek Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, USA
Christian Herder Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,German Center for Diabetes Research (DZD), Partner Düsseldorf, München-Neuherberg, Germany
Wolfgang Koenig Institute of Epidemiology and Medical Biometry, University of Ulm, Ulm, Germany.,Deutsches Herzzentrum München, Technische Universität München, Munich, Germany.,German Centre for Cardiovascular Research (DZHK), partner site Munich Heart Alliance, Munich, Germany
Barbara Thorand Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany
Cornelia Huth Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Partner München-Neuherberg, München-Neuherberg, Germany

Collapse

Bracher-Smith M, Crawford K, Escott-Price V. Machine learning for genetic prediction of psychiatric disorders: a systematic review. Mol Psychiatry 2021;26:70-79. [PMID: 32591634 PMCID: PMC7610853 DOI: 10.1038/s41380-020-0825-2] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 06/09/2020] [Accepted: 06/16/2020] [Indexed: 12/25/2022]

Abstract

Machine learning methods have been employed to make predictions in psychiatry from genotypes, with the potential to bring improved prediction of outcomes in psychiatric genetics; however, their current performance is unclear. We aim to systematically review machine learning methods for predicting psychiatric disorders from genetics alone and evaluate their discrimination, bias and implementation. Medline, PsycInfo, Web of Science and Scopus were searched for terms relating to genetics, psychiatric disorders and machine learning, including neural networks, random forests, support vector machines and boosting, on 10 September 2019. Following PRISMA guidelines, articles were screened for inclusion independently by two authors, extracted, and assessed for risk of bias. Overall, 63 full texts were assessed from a pool of 652 abstracts. Data were extracted for 77 models of schizophrenia, bipolar, autism or anorexia across 13 studies. Performance of machine learning methods was highly varied (0.48-0.95 AUC) and differed between schizophrenia (0.54-0.95 AUC), bipolar (0.48-0.65 AUC), autism (0.52-0.81 AUC) and anorexia (0.62-0.69 AUC). This is likely due to the high risk of bias identified in the study designs and analysis for reported results. Choices for predictor selection, hyperparameter search and validation methodology, and viewing of the test set during training were common causes of high risk of bias in analysis. Key steps in model development and validation were frequently not performed or unreported. Comparison of discrimination across studies was constrained by heterogeneity of predictors, outcome and measurement, in addition to sample overlap within and across studies. Given widespread high risk of bias and the small number of studies identified, it is important to ensure established analysis methods are adopted. We emphasise best practices in methodology and reporting for improving future studies.

Collapse

Boulesteix AL, Groenwold RH, Abrahamowicz M, Binder H, Briel M, Hornung R, Morris TP, Rahnenführer J, Sauerbrei W. Introduction to statistical simulations in health research. BMJ Open 2020;10:e039921. [PMID: 33318113 PMCID: PMC7737058 DOI: 10.1136/bmjopen-2020-039921] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Huang J, Huth C, Covic M, Troll M, Adam J, Zukunft S, Prehn C, Wang L, Nano J, Scheerer MF, Neschen S, Kastenmüller G, Suhre K, Laxy M, Schliess F, Gieger C, Adamski J, Hrabe de Angelis M, Peters A, Wang-Sattler R. Machine Learning Approaches Reveal Metabolic Signatures of Incident Chronic Kidney Disease in Individuals With Prediabetes and Type 2 Diabetes. Diabetes 2020;69:2756-2765. [PMID: 33024004 DOI: 10.2337/db20-0586] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022]

Affiliation(s)

Jialing Huang Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany German Center for Diabetes Research (DZD), München-Neuherberg, Germany
Cornelia Huth Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany German Center for Diabetes Research (DZD), München-Neuherberg, Germany
Marcela Covic Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany German Center for Diabetes Research (DZD), München-Neuherberg, Germany
Martina Troll Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Jonathan Adam Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Sven Zukunft Research Unit of Molecular Endocrinology and Metabolism, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Cornelia Prehn Research Unit of Molecular Endocrinology and Metabolism, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Li Wang Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Department of Scientific Research and Shandong University Postdoctoral Work Station, Liaocheng People's Hospital, Shandong, P. R. China
Jana Nano Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany German Center for Diabetes Research (DZD), München-Neuherberg, Germany
Markus F Scheerer Institute of Experimental Genetics, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Susanne Neschen Institute of Experimental Genetics, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Gabi Kastenmüller Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Karsten Suhre Department of Physiology and Biophysics, Weill Cornell Medicine - Qatar, Doha, Qatar
Michael Laxy Institute of Health Economics and Health Care Management, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Freimut Schliess Profil Institut für Stoffwechselforschung GmbH, Neuss, Germany
Christian Gieger Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany German Center for Diabetes Research (DZD), München-Neuherberg, Germany
Jerzy Adamski Research Unit of Molecular Endocrinology and Metabolism, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Chair of Experimental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, Freising, Germany
Martin Hrabe de Angelis German Center for Diabetes Research (DZD), München-Neuherberg, Germany Institute of Experimental Genetics, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Chair of Experimental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, Freising, Germany
Annette Peters Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany German Center for Diabetes Research (DZD), München-Neuherberg, Germany
Rui Wang-Sattler Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany Institute of Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany German Center for Diabetes Research (DZD), München-Neuherberg, Germany

Collapse

Machado RA, de Oliveira Silva C, Martelli-Junior H, das Neves LT, Coletta RD. Machine learning in prediction of genetic risk of nonsyndromic oral clefts in the Brazilian population. Clin Oral Investig 2020;25:1273-1280. [PMID: 32617779 DOI: 10.1007/s00784-020-03433-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 06/24/2020] [Indexed: 01/07/2023]

Abstract

OBJECTIVES

Genetic variants in multiple genes and loci have been associated with the risk of nonsyndromic cleft lip with or without cleft palate (NSCL ± P). However, the estimation of risk remains challenge, because most of these variants are population-specific rendering the identification of the underlying genetic risk difficult. Herein we examined the use of machine learning network in previously reported single nucleotide polymorphisms (SNPs) to predict risk of NSCL ± P in the Brazilian population.

MATERIALS AND METHODS

Random forest and neural network methods were applied in 72 SNPs in a case-control sample composed by 722 NSCL ± P and 866 controls for discrimination of NSCL ± P risk. SNP-SNP interactions and functional annotation biological processes associated with the identified NSCL ± P risk genes were verified.

RESULTS

Supervised random forest decision trees revealed high scores of importance for the SNPs rs11717284 and rs1875735 in FGF12, rs41268753 in GRHL3, rs2236225 in MTHFD1, rs2274976 in MTHFR, rs2235371 and rs642961 in IRF6, rs17085106 in RHPN2, rs28372960 in TCOF1, rs7078160 in VAX1, rs10762573 and rs2131960 in VCL, and rs227731 in 17q22, with an accuracy of 99% and an error rate of approximately 3% to predict the risk of NSCL ± P. Those same 13 SNPs were considered the most important for the neural network to effectively predict NSCL ± P risk, with an overall accuracy of 94%. Multivariate regression model revealed significant interactions among all SNPs, with an exception of those in FGF12 and MTHFD1. The most significantly biological processes for selected genes were those involved in tissue and epithelium development; neural tube closure; and metabolism of methionine, folate, and homocysteine.

CONCLUSIONS

Our results provide novel clues for genetic mechanism studies of NSCL ± P and point out for a machine learning model composed by 13 SNPs that is capable of predicting NSCL ± P risk.

CLINICAL RELEVANCE

Although validation is necessary, this genetic panel can be useful in the near future to assist in NSCL ± P genetic counseling.

Collapse

Caliebe A, Nothnagel M. Special issue on 'Genetic epidemiology of complex diseases: impact of population history and modelling assumptions'. Hum Genet 2020;139:1-3. [PMID: 31664516 DOI: 10.1007/s00439-019-02074-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]