Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Behravan H, Hartikainen JM, Tengström M, Pylkäs K, Winqvist R, Kosma VM, Mannermaa A. Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls. Sci Rep 2018;8:13149. [PMID: 30177847 DOI: 10.1038/s41598-018-31573-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 08/22/2018] [Indexed: 01/01/2023] Open

For:	Behravan H, Hartikainen JM, Tengström M, Pylkäs K, Winqvist R, Kosma VM, Mannermaa A. Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls. Sci Rep 2018;8:13149. [PMID: 30177847 DOI: 10.1038/s41598-018-31573-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 08/22/2018] [Indexed: 01/01/2023] Open

Number

Cited by Other Article(s)

Andrews N, Unrath N, Wall P, Buckley JF, Fanning S. Prediction of Listeria monocytogenes Clonal Complexes from Multilocus Variable Number Tandem Repeat Analysis Patterns Using a Machine Learning Approach. Foodborne Pathog Dis 2024;21:593-599. [PMID: 38963774 DOI: 10.1089/fpd.2023.0163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2024] Open

Alfayyadh MM, Maksemous N, Sutherland HG, Lea RA, Griffiths LR. Unravelling the Genetic Landscape of Hemiplegic Migraine: Exploring Innovative Strategies and Emerging Approaches. Genes (Basel) 2024;15:443. [PMID: 38674378 PMCID: PMC11049430 DOI: 10.3390/genes15040443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open

Chung CW, Chou SC, Hsiao TH, Zhang GJ, Chung YF, Chen YM. Machine learning approaches to identify systemic lupus erythematosus in anti-nuclear antibody-positive patients using genomic data and electronic health records. BioData Min 2024;17:1. [PMID: 38183082 PMCID: PMC10770905 DOI: 10.1186/s13040-023-00352-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 12/19/2023] [Indexed: 01/07/2024] Open

Ho M, Levy TJ, Koulas I, Founta K, Coppa K, Hirsch JS, Davidson KW, Spyropoulos AC, Zanos TP. Longitudinal dynamic clinical phenotypes of in-hospital COVID-19 patients across three dominant virus variants in New York. Int J Med Inform 2024;181:105286. [PMID: 37956643 PMCID: PMC10843635 DOI: 10.1016/j.ijmedinf.2023.105286] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/20/2023] [Accepted: 11/03/2023] [Indexed: 11/15/2023]

Abstract

BACKGROUND

COVID-19 is a challenging disease to characterize given its wide-ranging heterogeneous symptomatology. Several studies have attempted to extract clinical phenotypes but often relied on data from small patient cohorts, usually limited to only one viral variant and utilizing a static snapshot of patient data.

OBJECTIVE

This study aimed to identify clinical phenotypes of hospitalized COVID-19 patients and investigate their longitudinal dynamics throughout the pandemic, with the goal to relate these phenotypes to clinical outcomes and treatment strategies.

METHODS

We utilized routinely collected demographic and clinical data throughout the hospitalization of 38,077 patients admitted between 3/2020 to 5/2022, in 12 New York hospitals. Uniform Manifold Approximation and Projection and agglomerative hierarchical clustering were used to derive the clusters, followed by exploratory data analysis to compare the prevalence of comorbidities and treatments per cluster.

RESULTS

4 distinct clinical phenotypes remained robust in multi-site validation and were associated with different mortality rates. The temporal progression of these phenotypes throughout the COVID-19 pandemic demonstrated increased variability across the waves of the three dominant viral variants (alpha, delta, omicron). Longitudinal analysis evaluating changes in clinical phenotypes of each patient throughout the course of a 4-week hospital stay exemplified the dynamic nature of the disease progression. Factors such as sex, race/ethnicity and specific treatment modalities revealed significant and clinically relevant differences between the observed phenotypes.

CONCLUSIONS

Our proposed methodology has the potential of enabling clinicians and policy makers to draw evidence-based conclusions for guiding treatment modalities in a dynamic fashion.

Collapse

Affiliation(s)

Matthew Ho Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549
Todd J Levy Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030
Ioannis Koulas Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030
Kyriaki Founta Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549
Kevin Coppa Department of Clinical Digital Solutions, Northwell Health, New Hyde Park, NY 11042
Jamie S Hirsch Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549; Department of Clinical Digital Solutions, Northwell Health, New Hyde Park, NY 11042
Karina W Davidson Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549
Alex C Spyropoulos Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549
Theodoros P Zanos Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549.

Collapse

Bettencourt C, Skene N, Bandres-Ciga S, Anderson E, Winchester LM, Foote IF, Schwartzentruber J, Botia JA, Nalls M, Singleton A, Schilder BM, Humphrey J, Marzi SJ, Toomey CE, Kleifat AA, Harshfield EL, Garfield V, Sandor C, Keat S, Tamburin S, Frigerio CS, Lourida I, Ranson JM, Llewellyn DJ. Artificial intelligence for dementia genetics and omics. Alzheimers Dement 2023;19:5905-5921. [PMID: 37606627 PMCID: PMC10841325 DOI: 10.1002/alz.13427] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/14/2023] [Accepted: 07/18/2023] [Indexed: 08/23/2023]

Affiliation(s)

Conceicao Bettencourt Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
Nathan Skene UK Dementia Research Institute, Imperial College London, London, UK Department of Brain Sciences, Imperial College London, London, UK
Sara Bandres-Ciga Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
Emma Anderson Department of Mental Health of Older People, Division of Psychiatry, University College London, London, UK
Laura M Winchester Department of Psychiatry, University of Oxford, Oxford, UK
Isabelle F Foote Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, USA
Jeremy Schwartzentruber Open Targets, Cambridge, UK Wellcome Sanger Institute, Cambridge, UK Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, California, USA
Juan A Botia Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
Mike Nalls Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA Data Tecnica International LLC, Washington, DC, USA
Andrew Singleton Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
Brian M Schilder UK Dementia Research Institute, Imperial College London, London, UK Department of Brain Sciences, Imperial College London, London, UK
Jack Humphrey Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
Sarah J Marzi UK Dementia Research Institute, Imperial College London, London, UK Department of Brain Sciences, Imperial College London, London, UK
Christina E Toomey Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK Department of Clinical and Movement Neuroscience, UCL Queen Square Institute of Neurology, London, UK The Francis Crick Institute, London, UK
Ahmad Al Kleifat Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
Eric L Harshfield Stroke Research Group, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
Victoria Garfield MRC Unit for Lifelong Health and Ageing, Institute of Cardiovascular Science, University College London, London, UK
Cynthia Sandor UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
Samuel Keat UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
Stefano Tamburin Department of Neurosciences, Biomedicine and Movement Sciences, Neurology Section, University of Verona, Verona, Italy
Carlo Sala Frigerio UK Dementia Research Institute, Queen Square Institute of Neurology, University College London, London, UK
Ilianna Lourida University of Exeter Medical School, Exeter, UK
Janice M Ranson University of Exeter Medical School, Exeter, UK
David J Llewellyn University of Exeter Medical School, Exeter, UK The Alan Turing Institute, London, UK

Collapse

Polano M, Bedon L, Dal Bo M, Sorio R, Bartoletti M, De Mattia E, Cecchin E, Pisano C, Lorusso D, Lissoni AA, De Censi A, Cecere SC, Scollo P, Marchini S, Arenare L, De Giorgi U, Califano D, Biagioli E, Chiodini P, Perrone F, Pignata S, Toffoli G. Machine Learning Application Identifies Germline Markers of Hypertension in Patients With Ovarian Cancer Treated With Carboplatin, Taxane, and Bevacizumab. Clin Pharmacol Ther 2023;114:652-663. [PMID: 37243926 DOI: 10.1002/cpt.2960] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 05/22/2023] [Indexed: 05/29/2023]

Abstract

Pharmacogenomics studies how genes influence a person's response to treatment. When complex phenotypes are influenced by multiple genetic variations with little effect, a single piece of genetic information is often insufficient to explain this variability. The application of machine learning (ML) in pharmacogenomics holds great potential - namely, it can be used to unravel complicated genetic relationships that could explain response to therapy. In this study, ML techniques were used to investigate the relationship between genetic variations affecting more than 60 candidate genes and carboplatin-induced, taxane-induced, and bevacizumab-induced toxicities in 171 patients with ovarian cancer enrolled in the MITO-16A/MaNGO-OV2A trial. Single-nucleotide variation (SNV, formerly SNP) profiles were examined using ML to find and prioritize those associated with drug-induced toxicities, specifically hypertension, hematological toxicity, nonhematological toxicity, and proteinuria. The Boruta algorithm was used in cross-validation to determine the significance of SNVs in predicting toxicities. Important SNVs were then used to train eXtreme gradient boosting models. During cross-validation, the models achieved reliable performance with a Matthews correlation coefficient ranging from 0.375 to 0.410. A total of 43 SNVs critical for predicting toxicity were identified. For each toxicity, key SNVs were used to create a polygenic toxicity risk score that effectively divided individuals into high-risk and low-risk categories. In particular, compared with low-risk individuals, high-risk patients were 28-fold more likely to develop hypertension. The proposed method provided insightful data to improve precision medicine for patients with ovarian cancer, which may be useful for reducing toxicities and improving toxicity management.

Collapse

Affiliation(s)

Maurizio Polano Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano, Istituto di Ricovero e Cura a Carattere Scientifico, Aviano, Italy
Luca Bedon Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano, Istituto di Ricovero e Cura a Carattere Scientifico, Aviano, Italy
Michele Dal Bo Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano, Istituto di Ricovero e Cura a Carattere Scientifico, Aviano, Italy
Roberto Sorio Dipartimento di Oncologia Medica, Centro di Riferimento Oncologico di Aviano, Istituto di Ricovero e Cura a Carattere Scientifico, Aviano, Italy
Michele Bartoletti Dipartimento di Oncologia Medica, Centro di Riferimento Oncologico di Aviano, Istituto di Ricovero e Cura a Carattere Scientifico, Aviano, Italy
Elena De Mattia Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano, Istituto di Ricovero e Cura a Carattere Scientifico, Aviano, Italy
Erika Cecchin Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano, Istituto di Ricovero e Cura a Carattere Scientifico, Aviano, Italy
Carmela Pisano Uro-Gynecologic Oncology Unit, Istituto Nazionale Tumori Istituto di Ricovero e Cura a Carattere Scientifico Fondazione G. Pascale, Naples, Italy
Domenica Lorusso Department of Women and Child Health, Division of Gynecologic Oncology, Fondazione Policlinico Universitario A. Gemelli Istituto di Ricovero e Cura a Carattere Scientifico, Rome, Italy Department of Life Science and Public Health, Catholic University of Sacred Heart Largo Agostino Gemelli, Rome, Italy
Andrea Alberto Lissoni Clinica Ostetrica e Ginecologica, Istituto di Ricovero e Cura a Carattere Scientifico S. Gerardo Monza, Università di Milano Bicocca, Milano, Italy
Andrea De Censi Oncologia Medica, Ospedali Galliera, Genoa, Italy
Sabrina Chiara Cecere Uro-Gynecologic Oncology Unit, Istituto Nazionale Tumori Istituto di Ricovero e Cura a Carattere Scientifico Fondazione G. Pascale, Naples, Italy
Paolo Scollo Unità Operativa Ostetricia e Ginecologia, Dipartimento Materno-Infantile, Ospedale Cannizzaro, Catania, Italy
Sergio Marchini Molecular Pharmacology laboratory, Group of Cancer Pharmacology Istituto di Ricovero e Cura a Carattere Scientifico Humanitas Research Hospital, Rozzano, Italy
Laura Arenare Clinical Trial Unit, Istituto Nazionale Tumori, Istituto di Ricovero e Cura a Carattere Scientifico, Fondazione G. Pascale, Naples, Italy
Ugo De Giorgi Istituto di Ricovero e Cura a Carattere Scientifico Istituto Romagnolo per lo Studio dei Tumori Dino Amadori, Meldola, Italy
Daniela Califano Microenvironment Molecular Targets Unit, Istituto Nazionale Tumori IRCCS, Fondazione G. Pascale, Naples, Italy
Elena Biagioli Department Of Oncology, Istituto di Ricerche Farmacologiche Mario Negri IRCCS Milano, Milano, Italy
Paolo Chiodini Department of Mental Health and Public Medicine, Section of Statistics, Università degli Studi della Campania Luigi Vanvitelli, Naples, Italy
Francesco Perrone Clinical Trial Unit, Istituto Nazionale Tumori, Istituto di Ricovero e Cura a Carattere Scientifico, Fondazione G. Pascale, Naples, Italy
Sandro Pignata Uro-Gynecologic Oncology Unit, Istituto Nazionale Tumori Istituto di Ricovero e Cura a Carattere Scientifico Fondazione G. Pascale, Naples, Italy
Giuseppe Toffoli Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano, Istituto di Ricovero e Cura a Carattere Scientifico, Aviano, Italy

Collapse

Susmitha P, Kumar P, Yadav P, Sahoo S, Kaur G, Pandey MK, Singh V, Tseng TM, Gangurde SS. Genome-wide association study as a powerful tool for dissecting competitive traits in legumes. FRONTIERS IN PLANT SCIENCE 2023;14:1123631. [PMID: 37645459 PMCID: PMC10461012 DOI: 10.3389/fpls.2023.1123631] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 06/08/2023] [Indexed: 08/31/2023]

Abstract

Legumes are extremely valuable because of their high protein content and several other nutritional components. The major challenge lies in maintaining the quantity and quality of protein and other nutritional compounds in view of climate change conditions. The global need for plant-based proteins has increased the demand for seeds with a high protein content that includes essential amino acids. Genome-wide association studies (GWAS) have evolved as a standard approach in agricultural genetics for examining such intricate characters. Recent development in machine learning methods shows promising applications for dimensionality reduction, which is a major challenge in GWAS. With the advancement in biotechnology, sequencing, and bioinformatics tools, estimation of linkage disequilibrium (LD) based associations between a genome-wide collection of single-nucleotide polymorphisms (SNPs) and desired phenotypic traits has become accessible. The markers from GWAS could be utilized for genomic selection (GS) to predict superior lines by calculating genomic estimated breeding values (GEBVs). For prediction accuracy, an assortment of statistical models could be utilized, such as ridge regression best linear unbiased prediction (rrBLUP), genomic best linear unbiased predictor (gBLUP), Bayesian, and random forest (RF). Both naturally diverse germplasm panels and family-based breeding populations can be used for association mapping based on the nature of the breeding system (inbred or outbred) in the plant species. MAGIC, MCILs, RIAILs, NAM, and ROAM are being used for association mapping in several crops. Several modifications of NAM, such as doubled haploid NAM (DH-NAM), backcross NAM (BC-NAM), and advanced backcross NAM (AB-NAM), have also been used in crops like rice, wheat, maize, barley mustard, etc. for reliable marker-trait associations (MTAs), phenotyping accuracy is equally important as genotyping. Highthroughput genotyping, phenomics, and computational techniques have advanced during the past few years, making it possible to explore such enormous datasets. Each population has unique virtues and flaws at the genomics and phenomics levels, which will be covered in more detail in this review study. The current investigation includes utilizing elite breeding lines as association mapping population, optimizing the choice of GWAS selection, population size, and hurdles in phenotyping, and statistical methods which will analyze competitive traits in legume breeding.

Collapse

Choudhary A, Anand A, Singh A, Roy P, Singh N, Kumar V, Sharma S, Baranwal M. Machine learning-based ensemble approach in prediction of lung cancer predisposition using XRCC1 gene polymorphism. J Biomol Struct Dyn 2023:1-10. [PMID: 37545160 DOI: 10.1080/07391102.2023.2242492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 07/23/2023] [Indexed: 08/08/2023]

Alzoubi H, Alzubi R, Ramzan N. Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations. SENSORS (BASEL, SWITZERLAND) 2023;23:s23094439. [PMID: 37177642 PMCID: PMC10181706 DOI: 10.3390/s23094439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/05/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]

Learning high-order interactions for polygenic risk prediction. PLoS One 2023;18:e0281618. [PMID: 36763605 PMCID: PMC9916647 DOI: 10.1371/journal.pone.0281618] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 01/27/2023] [Indexed: 02/11/2023] Open

Abstract

Within the framework of precision medicine, the stratification of individual genetic susceptibility based on inherited DNA variation has paramount relevance. However, one of the most relevant pitfalls of traditional Polygenic Risk Scores (PRS) approaches is their inability to model complex high-order non-linear SNP-SNP interactions and their effect on the phenotype (e.g. epistasis). Indeed, they incur in a computational challenge as the number of possible interactions grows exponentially with the number of SNPs considered, affecting the statistical reliability of the model parameters as well. In this work, we address this issue by proposing a novel PRS approach, called High-order Interactions-aware Polygenic Risk Score (hiPRS), that incorporates high-order interactions in modeling polygenic risk. The latter combines an interaction search routine based on frequent itemsets mining and a novel interaction selection algorithm based on Mutual Information, to construct a simple and interpretable weighted model of user-specified dimensionality that can predict a given binary phenotype. Compared to traditional PRSs methods, hiPRS does not rely on GWAS summary statistics nor any external information. Moreover, hiPRS differs from Machine Learning-based approaches that can include complex interactions in that it provides a readable and interpretable model and it is able to control overfitting, even on small samples. In the present work we demonstrate through a comprehensive simulation study the superior performance of hiPRS w.r.t. state of the art methods, both in terms of scoring performance and interpretability of the resulting model. We also test hiPRS against small sample size, class imbalance and the presence of noise, showcasing its robustness to extreme experimental settings. Finally, we apply hiPRS to a case study on real data from DACHS cohort, defining an interaction-aware scoring model to predict mortality of stage II-III Colon-Rectal Cancer patients treated with oxaliplatin.

Collapse

Zizaan A, Idri A. Machine learning based Breast Cancer screening: trends, challenges, and opportunities. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2023. [DOI: 10.1080/21681163.2023.2172615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]

Gonzalez-Gomez R, Ibañez A, Moguilner S. Multiclass characterization of frontotemporal dementia variants via multimodal brain network computational inference. Netw Neurosci 2023;7:322-350. [PMID: 37333999 PMCID: PMC10270711 DOI: 10.1162/netn_a_00285] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 10/03/2022] [Indexed: 04/03/2024] Open

Abstract

Characterizing a particular neurodegenerative condition against others possible diseases remains a challenge along clinical, biomarker, and neuroscientific levels. This is the particular case of frontotemporal dementia (FTD) variants, where their specific characterization requires high levels of expertise and multidisciplinary teams to subtly distinguish among similar physiopathological processes. Here, we used a computational approach of multimodal brain networks to address simultaneous multiclass classification of 298 subjects (one group against all others), including five FTD variants: behavioral variant FTD, corticobasal syndrome, nonfluent variant primary progressive aphasia, progressive supranuclear palsy, and semantic variant primary progressive aphasia, with healthy controls. Fourteen machine learning classifiers were trained with functional and structural connectivity metrics calculated through different methods. Due to the large number of variables, dimensionality was reduced, employing statistical comparisons and progressive elimination to assess feature stability under nested cross-validation. The machine learning performance was measured through the area under the receiver operating characteristic curves, reaching 0.81 on average, with a standard deviation of 0.09. Furthermore, the contributions of demographic and cognitive data were also assessed via multifeatured classifiers. An accurate simultaneous multiclass classification of each FTD variant against other variants and controls was obtained based on the selection of an optimum set of features. The classifiers incorporating the brain's network and cognitive assessment increased performance metrics. Multimodal classifiers evidenced specific variants' compromise, across modalities and methods through feature importance analysis. If replicated and validated, this approach may help to support clinical decision tools aimed to detect specific affectations in the context of overlapping diseases.

Collapse

Salgado Á, de Melo-Minardi RC, Giovanetti M, Veloso A, Morais-Rodrigues F, Adelino T, de Jesus R, Tosta S, Azevedo V, Lourenco J, Alcantara LCJ. Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus. PLoS One 2022;17:e0278982. [PMID: 36508435 PMCID: PMC9744328 DOI: 10.1371/journal.pone.0278982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 11/29/2022] [Indexed: 12/14/2022] Open

Affiliation(s)

Álvaro Salgado Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil * E-mail: (AS); (LCJA); (JL)
Raquel C. de Melo-Minardi Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Marta Giovanetti Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil Laboratório de Flavivírus, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
Adriano Veloso Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Francielly Morais-Rodrigues Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Talita Adelino Laboratório Central de Saúde Pública, Fundação Ezequiel Dias, Belo Horizonte, Minas Gerais, Brazil
Ronaldo de Jesus Coordenação Geral dos Laboratórios de Saúde Pública, Secretaria de Vigilância em Saúde, Ministério da Saúde, Brasília, DF, Brazil
Stephane Tosta Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Vasco Azevedo Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
José Lourenco Department of Zoology, University of Oxford, Oxford, United Kingdom * E-mail: (AS); (LCJA); (JL)
Luiz Carlos J. Alcantara Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil Laboratório de Flavivírus, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil * E-mail: (AS); (LCJA); (JL)

Collapse

Moguilner S, Birba A, Fittipaldi S, Gonzalez-Campo C, Tagliazucchi E, Reyes P, Matallana D, Parra MA, Slachevsky A, Farías G, Cruzat J, García A, Eyre HA, Joie RL, Rabinovici G, Whelan R, Ibáñez A. Multi-feature computational framework for combined signatures of dementia in underrepresented settings. J Neural Eng 2022;19:10.1088/1741-2552/ac87d0. [PMID: 35940105 PMCID: PMC11177279 DOI: 10.1088/1741-2552/ac87d0] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 08/08/2022] [Indexed: 11/11/2022]

Affiliation(s)

Sebastian Moguilner Global Brain Health Institute (GBHI), University of California San Francisco (UCSF), CA, United States of America Cognitive Neuroscience Center (CNC), Universidad de San Andrés, Buenos Aires, Argentina Latin American Brain Health (BrainLat), Universidad Adolfo Ibáñez, Santiago, Chile Trinity College Dublin, Dublin, Ireland
Agustina Birba Cognitive Neuroscience Center (CNC), Universidad de San Andrés, Buenos Aires, Argentina Latin American Brain Health (BrainLat), Universidad Adolfo Ibáñez, Santiago, Chile National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina
Sol Fittipaldi Cognitive Neuroscience Center (CNC), Universidad de San Andrés, Buenos Aires, Argentina National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina
Cecilia Gonzalez-Campo Cognitive Neuroscience Center (CNC), Universidad de San Andrés, Buenos Aires, Argentina
Enzo Tagliazucchi Latin American Brain Health (BrainLat), Universidad Adolfo Ibáñez, Santiago, Chile National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina Department of Physics, University of Buenos Aires, Buenos Aires, Argentina
Pablo Reyes Medical School, Aging Institute, Psychiatry and Mental Health, Pontificia Universidad Javeriana, Bogota, Colombia
Diana Matallana Medical School, Aging Institute, Psychiatry and Mental Health, Pontificia Universidad Javeriana, Bogota, Colombia
Mario A Parra MAP: School of Psychological Sciences and Health, University of Strathclyde, Glasgow, United Kingdom
Andrea Slachevsky Gerosciences Center for Brain Health and Metabolism, Santiago, Chile Faculty of Medicine, University of Chile, Santiago, Chile Memory and Neuropsychiatric Clinic (CMYN) Neurology Department, Hospital del Salvador and University of Chile, Santiago, Chile Servicio de Neurología, Departamento de Medicina, Clínica Alemana-Universidad del Desarrollo, Santiago de Chile, Chile
Gonzalo Farías Faculty of Medicine, University of Chile, Santiago, Chile
Josefina Cruzat Latin American Brain Health (BrainLat), Universidad Adolfo Ibáñez, Santiago, Chile
Adolfo García Global Brain Health Institute (GBHI), University of California San Francisco (UCSF), CA, United States of America Cognitive Neuroscience Center (CNC), Universidad de San Andrés, Buenos Aires, Argentina National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina Departamento de Lingüística y Literatura, Facultad de Humanidades, Universidad de Santiago de Chile, Santiago, Chile Trinity College Dublin, Dublin, Ireland
Harris A Eyre Global Brain Health Institute (GBHI), University of California San Francisco (UCSF), CA, United States of America Neuroscience-Inspired Policy Initiative, Organisation for Economic Co-operation and Development and PRODEO Institute, Paris, France IMPACT, The Institute for Mental and Physical Health and Clinical Translation, Deakin University, Geelong, Victoria, Australia Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, TX, United States of America Trinity College Dublin, Dublin, Ireland
Renaud La Joie Memory and Aging Center, Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States of America
Gil Rabinovici Global Brain Health Institute (GBHI), University of California San Francisco (UCSF), CA, United States of America Memory and Aging Center, Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States of America Trinity College Dublin, Dublin, Ireland
Robert Whelan Global Brain Health Institute (GBHI), University of California San Francisco (UCSF), CA, United States of America Trinity College Dublin, Dublin, Ireland
Agustín Ibáñez Global Brain Health Institute (GBHI), University of California San Francisco (UCSF), CA, United States of America Cognitive Neuroscience Center (CNC), Universidad de San Andrés, Buenos Aires, Argentina Latin American Brain Health (BrainLat), Universidad Adolfo Ibáñez, Santiago, Chile National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina Trinity College Dublin, Dublin, Ireland

Collapse

Elgart M, Lyons G, Romero-Brufau S, Kurniansyah N, Brody JA, Guo X, Lin HJ, Raffield L, Gao Y, Chen H, de Vries P, Lloyd-Jones DM, Lange LA, Peloso GM, Fornage M, Rotter JI, Rich SS, Morrison AC, Psaty BM, Levy D, Redline S, Sofer T. Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations. Commun Biol 2022;5:856. [PMID: 35995843 PMCID: PMC9395509 DOI: 10.1038/s42003-022-03812-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 08/05/2022] [Indexed: 01/03/2023] Open

Affiliation(s)

Michael Elgart Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA. Department of Medicine, Harvard Medical School, Boston, MA, USA.
Genevieve Lyons Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Santiago Romero-Brufau Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA Department of Medicine, Mayo Clinic, Rochester, MN, USA
Nuzulul Kurniansyah Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
Jennifer A Brody Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
Xiuqing Guo The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Henry J Lin The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Laura Raffield Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
Yan Gao The Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
Han Chen Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Paul de Vries Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
Donald M Lloyd-Jones Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
Leslie A Lange Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA
Gina M Peloso Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
Myriam Fornage Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
Jerome I Rotter The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Stephen S Rich Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA
Alanna C Morrison Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
Bruce M Psaty Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
Daniel Levy The Population Sciences Branch of the National Heart, Lung and Blood Institute, Bethesda, MD, USA The Framingham Heart Study, Framingham, MA, USA
Susan Redline Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA Department of Medicine, Harvard Medical School, Boston, MA, USA
Tamar Sofer Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA. Department of Medicine, Harvard Medical School, Boston, MA, USA. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Collapse

Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach. BMC Bioinformatics 2022;23:325. [PMID: 35934714 PMCID: PMC9358850 DOI: 10.1186/s12859-022-04870-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 08/01/2022] [Indexed: 11/25/2022] Open

Abstract

Background

The malaria risk prediction is currently limited to using advanced statistical methods, such as time series and cluster analysis on epidemiological data. Nevertheless, machine learning models have been explored to study the complexity of malaria through blood smear images and environmental data. However, to the best of our knowledge, no study analyses the contribution of Single Nucleotide Polymorphisms (SNPs) to malaria using a machine learning model. More specifically, this study aims to quantify an individual's susceptibility to the development of malaria by using risk scores obtained from the cumulative effects of SNPs, known as weighted genetic risk scores (wGRS).

Results

We proposed an SNP-based feature extraction algorithm that incorporates the susceptibility information of an individual to malaria to generate the feature set. However, it can become computationally expensive for a machine learning model to learn from many SNPs. Therefore, we reduced the feature set by employing the Logistic Regression and Recursive Feature Elimination (LR-RFE) method to select SNPs that improve the efficacy of our model. Next, we calculated the wGRS of the selected feature set, which is used as the model's target variables. Moreover, to compare the performance of the wGRS-only model, we calculated and evaluated the combination of wGRS with genotype frequency (wGRS + GF). Finally, Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and Ridge regression algorithms are utilized to establish the machine learning models for malaria risk prediction.

Conclusions

Our proposed approach identified SNP rs334 as the most contributing feature with an importance score of 6.224 compared to the baseline, with an importance score of 1.1314. This is an important result as prior studies have proven that rs334 is a major genetic risk factor for malaria. The analysis and comparison of the three machine learning models demonstrated that LightGBM achieves the highest model performance with a Mean Absolute Error (MAE) score of 0.0373. Furthermore, based on wGRS + GF, all models performed significantly better than wGRS alone, in which LightGBM obtained the best performance (0.0033 MAE score).

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-022-04870-0.

Collapse

Hou C, Xu B, Hao Y, Yang D, Song H, Li J. Development and validation of polygenic risk scores for prediction of breast cancer and breast cancer subtypes in Chinese women. BMC Cancer 2022;22:374. [PMID: 35395775 PMCID: PMC8991589 DOI: 10.1186/s12885-022-09425-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/15/2022] [Indexed: 02/08/2023] Open

Abstract

Background

Studies investigating breast cancer polygenic risk score (PRS) in Chinese women are scarce. The objectives of this study were to develop and validate PRSs that could be used to stratify risk for overall and subtype-specific breast cancer in Chinese women, and to evaluate the performance of a newly proposed Artificial Neural Network (ANN) based approach for PRS construction.

Methods

The PRSs were constructed using the dataset from a genome-wide association study (GWAS) and validated in an independent case-control study. Three approaches, including repeated logistic regression (RLR), logistic ridge regression (LRR) and ANN based approach, were used to build the PRSs for overall and subtype-specific breast cancer based on 24 selected single nucleotide polymorphisms (SNPs). Predictive performance and calibration of the PRSs were evaluated unadjusted and adjusted for Gail-2 model 5-year risk or classical breast cancer risk factors.

Results

The primary PRS_ANN and PRS_LRR both showed modest predictive ability for overall breast cancer (odds ratio per interquartile range increase of the PRS in controls [IQ-OR] 1.76 vs 1.58; area under the receiver operator characteristic curve [AUC] 0.601 vs 0.598) and remained to be predictive after adjustment. Although estrogen receptor negative (ER⁻) breast cancer was poorly predicted by the primary PRSs, the ER⁻ PRSs trained solely on ER⁻ breast cancer cases saw a substantial improvement in predictions of ER⁻ breast cancer.

Conclusions

The 24 SNPs based PRSs can provide additional risk information to help breast cancer risk stratification in the general population of China. The newly proposed ANN approach for PRS construction has potential to replace the traditional approaches, but more studies are needed to validate and investigate its performance.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12885-022-09425-3.

Collapse

Govender P, Fashoto SG, Maharaj L, Adeleke MA, Mbunge E, Olamijuwon J, Akinnuwesi B, Okpeku M. The application of machine learning to predict genetic relatedness using human mtDNA hypervariable region I sequences. PLoS One 2022;17:e0263790. [PMID: 35180257 PMCID: PMC8856515 DOI: 10.1371/journal.pone.0263790] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 01/26/2022] [Indexed: 11/21/2022] Open

Abstract

Human identification of unknown samples following disaster and mass casualty events is essential, especially to bring closure to family and friends of the deceased. Unfortunately, victim identification is often challenging for forensic investigators as analysis becomes complicated when biological samples are degraded or of poor quality as a result of exposure to harsh environmental factors. Mitochondrial DNA becomes the ideal option for analysis, particularly for determining the origin of the samples. In such events, the estimation of genetic parameters plays an important role in modelling and predicting genetic relatedness and is useful in assigning unknown individuals to an ethnic group. Various techniques exist for the estimation of genetic relatedness, but the use of Machine learning (ML) algorithms are novel and presently the least used in forensic genetic studies. In this study, we investigated the ability of ML algorithms to predict genetic relatedness using hypervariable region I sequences; that were retrieved from the GenBank database for three race groups, namely African, Asian and Caucasian. Four ML classification algorithms; Support vector machines (SVM), Linear discriminant analysis (LDA), Quadratic discriminant analysis (QDA) and Random Forest (RF) were hybridised with one-hot encoding, Principal component analysis (PCA) and Bags of Words (BoW), and were compared for inferring genetic relatedness. The findings from this study on WEKA showed that genetic inferences based on PCA-SVM achieved an overall accuracy of 80–90% and consistently outperformed PCA-LDA, PCA-RF and PCA-QDA, while in Python BoW-PCA-RF achieved 94.4% accuracy which outperformed BoW-PCA-SVM, BoW-PCA-LDA and BoW-PCA-QDA respectively. ML results from the use of WEKA and Python software tools displayed higher accuracies as compared to the Analysis of molecular variance results. Given the results, SVM and RF algorithms are likely to also be useful in other sequence classification applications, making it a promising tool in genetics and forensic science. The study provides evidence that ML can be utilized as a supplementary tool for forensic genetics casework analysis.

Collapse

Karim MR, Cochez M, Zappa A, Sahay R, Rebholz-Schuhmann D, Beyan O, Decker S. Convolutional Embedded Networks for Population Scale Clustering and Bio-Ancestry Inferencing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:369-382. [PMID: 32750845 DOI: 10.1109/tcbb.2020.2994649] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Abstract

The study of genetic variants (GVs) can help find correlating population groups and to identify cohorts that are predisposed to common diseases and explain differences in disease susceptibility and how patients react to drugs. Machine learning techniques are increasingly being applied to identify interacting GVs to understand their complex phenotypic traits. Since the performance of a learning algorithm not only depends on the size and nature of the data but also on the quality of underlying representation, deep neural networks (DNNs) can learn non-linear mappings that allow transforming GVs data into more clustering and classification friendly representations than manual feature selection. In this paper, we propose convolutional embedded networks (CEN) in which we combine two DNN architectures called convolutional embedded clustering (CEC) and convolutional autoencoder (CAE) classifier for clustering individuals and predicting geographic ethnicity based on GVs, respectively. We employed CAE-based representation learning to 95 million GVs from the '1000 genomes' (covering 2,504 individuals from 26 ethnic origins) and 'Simons genome diversity' (covering 279 individuals from 130 ethnic origins) projects. Quantitative and qualitative analyses with a focus on accuracy and scalability show that our approach outperforms state-of-the-art approaches such as VariantSpark and ADMIXTURE. In particular, CEC can cluster targeted population groups in 22 hours with an adjusted rand index (ARI) of 0.915, the normalized mutual information (NMI) of 0.92, and the clustering accuracy (ACC) of 89 percent. Contrarily, the CAE classifier can predict the geographic ethnicity of unknown samples with an F1 and Mathews correlation coefficient (MCC) score of 0.9004 and 0.8245, respectively. Further, to provide interpretations of the predictions, we identify significant biomarkers using gradient boosted trees (GBT) and SHapley Additive exPlanations (SHAP). Overall, our approach is transparent and faster than the baseline methods, and scalable for 5 to 100 percent of the full human genome.

Collapse

Shao D, Dai Y, Li N, Cao X, Zhao W, Cheng L, Rong Z, Huang L, Wang Y, Zhao J. Artificial intelligence in clinical research of cancers. Brief Bioinform 2021;23:6470966. [PMID: 34929741 PMCID: PMC8769909 DOI: 10.1093/bib/bbab523] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 11/06/2021] [Accepted: 11/13/2021] [Indexed: 12/16/2022] Open

Westhues CC, Mahone GS, da Silva S, Thorwarth P, Schmidt M, Richter JC, Simianer H, Beissinger TM. Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks. FRONTIERS IN PLANT SCIENCE 2021;12:699589. [PMID: 34880880 PMCID: PMC8647909 DOI: 10.3389/fpls.2021.699589] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 10/15/2021] [Indexed: 05/26/2023]

Abstract

The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.

Collapse

Structural and functional motor-network disruptions predict selective action-concept deficits: Evidence from frontal lobe epilepsy. Cortex 2021;144:43-55. [PMID: 34637999 DOI: 10.1016/j.cortex.2021.08.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 07/12/2021] [Accepted: 08/05/2021] [Indexed: 12/22/2022]

Moguilner S, Birba A, Fino D, Isoardi R, Huetagoyena C, Otoya R, Tirapu V, Cremaschi F, Sedeño L, Ibáñez A, García AM. Multimodal neurocognitive markers of frontal lobe epilepsy: Insights from ecological text processing. Neuroimage 2021;235:117998. [PMID: 33789131 PMCID: PMC8272524 DOI: 10.1016/j.neuroimage.2021.117998] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 03/15/2021] [Accepted: 03/24/2021] [Indexed: 01/07/2023] Open

Affiliation(s)

Sebastian Moguilner Global Brain Health Institute, UCSF, California, US, & Trinity College Dublin, Dublin, Ireland; Nuclear Medicine School Foundation (FUESMEN), National Commission of Atomic Energy (CNEA), Mendoza, Argentina
Agustina Birba University of San Andres, Buenos Aires, Argentina; National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina
Daniel Fino Nuclear Medicine School Foundation (FUESMEN), National Commission of Atomic Energy (CNEA), Mendoza, Argentina; Fundación Argentina para el Desarrollo en Salud, Mendoza, Argentina
Roberto Isoardi Nuclear Medicine School Foundation (FUESMEN), National Commission of Atomic Energy (CNEA), Mendoza, Argentina
Celeste Huetagoyena Neuromed, Clinical Neuroscience, Mendoza, Argentina; Universidad Católica Argentina
Raúl Otoya Neuromed, Clinical Neuroscience, Mendoza, Argentina
Viviana Tirapu Nuclear Medicine School Foundation (FUESMEN), National Commission of Atomic Energy (CNEA), Mendoza, Argentina; Neuromed, Clinical Neuroscience, Mendoza, Argentina
Fabián Cremaschi Nuclear Medicine School Foundation (FUESMEN), National Commission of Atomic Energy (CNEA), Mendoza, Argentina; Neuroscience Department of the School of Medicine, National University of Cuyo, Mendoza, Argentina; Santa Isabel de Hungría Hospital, Mendoza, Argentina
Lucas Sedeño National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina
Agustín Ibáñez Global Brain Health Institute, UCSF, California, US, & Trinity College Dublin, Dublin, Ireland; University of San Andres, Buenos Aires, Argentina; National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina; Center for Social and Cognitive Neuroscience (CSCN), School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile
Adolfo M García Global Brain Health Institute, UCSF, California, US, & Trinity College Dublin, Dublin, Ireland; University of San Andres, Buenos Aires, Argentina; National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina; Faculty of Education, National University of Cuyo (UNCuyo), Mendoza, Argentina; Departamento de Lingüística y Literatura, Facultad de Humanidades, Universidad de Santiago de Chile, Santiago, Chile.

Collapse

Banegas-Luna AJ, Peña-García J, Iftene A, Guadagni F, Ferroni P, Scarpato N, Zanzotto FM, Bueno-Crespo A, Pérez-Sánchez H. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. Int J Mol Sci 2021;22:4394. [PMID: 33922356 PMCID: PMC8122817 DOI: 10.3390/ijms22094394] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/18/2022] Open

Muneeb M, Henschel A. Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods. BMC Bioinformatics 2021;22:198. [PMID: 33874881 PMCID: PMC8056510 DOI: 10.1186/s12859-021-04077-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/03/2021] [Indexed: 01/08/2023] Open

Abstract

Background

Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning.

Results

The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%.

Conclusion

Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.

Collapse

Lebrett MB, Crosbie EJ, Smith MJ, Woodward ER, Evans DG, Crosbie PAJ. Targeting lung cancer screening to individuals at greatest risk: the role of genetic factors. J Med Genet 2021;58:217-226. [PMID: 33514608 PMCID: PMC8005792 DOI: 10.1136/jmedgenet-2020-107399] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 12/06/2020] [Accepted: 12/08/2020] [Indexed: 12/24/2022]

Warner E, Wang N, Lee J, Rao A. Meaningful incorporation of artificial intelligence for personalized patient management during cancer: Quantitative imaging, risk assessment, and therapeutic outcomes. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00017-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Methods for correcting inference based on outcomes predicted by machine learning. Proc Natl Acad Sci U S A 2020;117:30266-30275. [PMID: 33208538 PMCID: PMC7720220 DOI: 10.1073/pnas.2001238117] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Seo H, Cho DH. Feature selection algorithm based on dual correlation filters for cancer-associated somatic variants. BMC Bioinformatics 2020;21:486. [PMID: 33121438 PMCID: PMC7596964 DOI: 10.1186/s12859-020-03767-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 09/18/2020] [Indexed: 12/30/2022] Open

Finkbeiner S. Functional genomics, genetic risk profiling and cell phenotypes in neurodegenerative disease. Neurobiol Dis 2020;146:105088. [PMID: 32977020 PMCID: PMC7686089 DOI: 10.1016/j.nbd.2020.105088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 09/11/2020] [Accepted: 09/18/2020] [Indexed: 12/03/2022] Open

Bakhtiari S, Sulaimany S, Talebi M, Kalhor K. Computational Prediction of Probable Single Nucleotide Polymorphism-Cancer Relationships. Cancer Inform 2020;19:1176935120942216. [PMID: 32728337 PMCID: PMC7364831 DOI: 10.1177/1176935120942216] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 06/22/2020] [Indexed: 12/18/2022] Open

Behravan H, Hartikainen JM, Tengström M, Kosma VM, Mannermaa A. Predicting breast cancer risk using interacting genetic and demographic factors and machine learning. Sci Rep 2020;10:11044. [PMID: 32632202 PMCID: PMC7338351 DOI: 10.1038/s41598-020-66907-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 06/01/2020] [Indexed: 12/21/2022] Open

Machine Learning Supports Long Noncoding RNAs as Expression Markers for Endometrial Carcinoma. BIOMED RESEARCH INTERNATIONAL 2020;2020:3968279. [PMID: 32420338 PMCID: PMC7199595 DOI: 10.1155/2020/3968279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/17/2019] [Indexed: 12/19/2022]

Bandoy DJDR, Weimer BC. Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease. Microorganisms 2020;8:E549. [PMID: 32290186 PMCID: PMC7232492 DOI: 10.3390/microorganisms8040549] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/07/2020] [Accepted: 04/08/2020] [Indexed: 01/17/2023] Open

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44-56. [PMID: 30617339 DOI: 10.1038/s41591-018-0300-7] [Citation(s) in RCA: 2155] [Impact Index Per Article: 431.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 11/12/2018] [Indexed: 11/08/2022]