1
|
Perschinka F, Peer A, Joannidis M. [Artificial intelligence and acute kidney injury]. Med Klin Intensivmed Notfmed 2024; 119:199-207. [PMID: 38396124 PMCID: PMC10995052 DOI: 10.1007/s00063-024-01111-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 02/25/2024]
Abstract
Digitalization is increasingly finding its way into intensive care units and with it artificial intelligence (AI) for critically ill patients. One promising area for the use of AI is in the field of acute kidney injury (AKI). The use of AI is primarily focused on the prediction of AKI, but further approaches are also being used to classify existing AKI into different phenotypes. Different AI models are used for prediction. The area under the receiver operating characteristic curve values (AUROC) achieved with these models vary and are influenced by several factors, such as the prediction time and the definition of AKI. Most models have an AUROC between 0.650 and 0.900, with lower values for predictions further into the future and when applying Acute Kidney Injury Network (AKIN) instead of KDIGO criteria. Classification into phenotypes already makes it possible to categorize patients into groups with different risks of mortality or requirement of renal replacement therapy (RRT), but the etiologies or therapeutic consequences derived from this are still lacking. However, all the models suffer from AI-specific shortcomings. The use of large databases does not make it possible to promptly include recent changes in therapy and the implementation of new biomarkers in a relevant proportion. For this reason, serum creatinine and urinary output, with their known limitations, dominate current AI models for prediction impairing the performance of the current models. On the other hand, the increasingly complex models no longer allow physicians to understand the basis on which the warning of a threatening AKI is calculated and subsequent initiation of therapy should take place. The successful use of AIs in routine clinical practice will be highly determined by the trust of the physicians in the systems and overcoming the aforementioned weaknesses. However, the clinician will remain irreplaceable as the decisive authority for critically ill patients by combining measurable and nonmeasurable parameters.
Collapse
Affiliation(s)
| | | | - Michael Joannidis
- Gemeinsame Einrichtung für Internistische Notfall- und Intensivmedizin, Department Innere Medizin, Medizinische Universität Innsbruck, Anichstraße 35, 6020, Innsbruck, Österreich.
| |
Collapse
|
2
|
Mehmetbeyoglu E, Duman A, Taheri S, Ozkul Y, Rassoulzadegan M. From Data to Insights: Machine Learning Empowers Prognostic Biomarker Prediction in Autism. J Pers Med 2023; 13:1713. [PMID: 38138941 PMCID: PMC10744627 DOI: 10.3390/jpm13121713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open
Abstract
Autism Spectrum Disorder (ASD) poses significant challenges to society and science due to its impact on communication, social interaction, and repetitive behavior patterns in affected children. The Autism and Developmental Disabilities Monitoring (ADDM) Network continuously monitors ASD prevalence and characteristics. In 2020, ASD prevalence was estimated at 1 in 36 children, with higher rates than previous estimates. This study focuses on ongoing ASD research conducted by Erciyes University. Serum samples from 45 ASD patients and 21 unrelated control participants were analyzed to assess the expression of 372 microRNAs (miRNAs). Six miRNAs (miR-19a-3p, miR-361-5p, miR-3613-3p, miR-150-5p, miR-126-3p, and miR-499a-5p) exhibited significant downregulation in all ASD patients compared to healthy controls. The current study endeavors to identify dependable diagnostic biomarkers for ASD, addressing the pressing need for non-invasive, accurate, and cost-effective diagnostic tools, as current methods are subjective and time-intensive. A pivotal discovery in this study is the potential diagnostic value of miR-126-3p, offering the promise of earlier and more accurate ASD diagnoses, potentially leading to improved intervention outcomes. Leveraging machine learning, such as the K-nearest neighbors (KNN) model, presents a promising avenue for precise ASD diagnosis using miRNA biomarkers.
Collapse
Affiliation(s)
- Ecmel Mehmetbeyoglu
- Department of Cancer and Genetics, Cardiff University, Cardiff CF14 4XN, UK
- Betul-Ziya Eren Genome and Stem Cell Center, Erciyes University, Kayseri 38280, Turkey; (S.T.); (Y.O.); (M.R.)
| | - Abdulkerim Duman
- School of Engineering, Cardiff University, Cardiff CF24 3AA, UK;
| | - Serpil Taheri
- Betul-Ziya Eren Genome and Stem Cell Center, Erciyes University, Kayseri 38280, Turkey; (S.T.); (Y.O.); (M.R.)
- Department of Medical Biology, Erciyes University, Kayseri 38280, Turkey
| | - Yusuf Ozkul
- Betul-Ziya Eren Genome and Stem Cell Center, Erciyes University, Kayseri 38280, Turkey; (S.T.); (Y.O.); (M.R.)
- Department of Medical Genetics, Erciyes University, Kayseri 38280, Turkey
| | - Minoo Rassoulzadegan
- Betul-Ziya Eren Genome and Stem Cell Center, Erciyes University, Kayseri 38280, Turkey; (S.T.); (Y.O.); (M.R.)
- Inserm-CNRS, Université Côte d’Azur, 06107 Nice, France
| |
Collapse
|
3
|
Kolb KL, Mira ALS, Auer ED, Bucco ID, de Lima e Silva CE, dos Santos PI, Hoch VBB, Oliveira LC, Hauser AB, Hundt JE, Shuldiner AR, Lopes FL, Boysen TJ, Franke A, Pinto LFR, Soares-Lima SC, Kretzschmar GC, Boldt ABW. Glucocorticoid Receptor Gene ( NR3C1) Polymorphisms and Metabolic Syndrome: Insights from the Mennonite Population. Genes (Basel) 2023; 14:1805. [PMID: 37761945 PMCID: PMC10530687 DOI: 10.3390/genes14091805] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 09/06/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
The regulation of the hypothalamic-pituitary-adrenal (HPA) axis is associated with polymorphisms and the methylation degree of the glucocorticoid receptor gene (NR3C1) and is potentially involved in the development of metabolic syndrome (MetS). In order to evaluate the association between MetS with the polymorphisms, methylation, and gene expression of the NR3C1 in the genetically isolated Brazilian Mennonite population, we genotyped 20 NR3C1 polymorphisms in 74 affected (MetS) and 138 unaffected individuals without affected first-degree relatives (Co), using exome sequencing, as well as five variants from non-exonic regions, in 70 MetS and 166 Co, using mass spectrometry. The methylation levels of 11 1F CpG sites were quantified using pyrosequencing (66 MetS and 141 Co), and the NR3C1 expression was evaluated via RT-qPCR (14 MetS and 25 Co). Age, physical activity, and family environment during childhood were associated with MetS. Susceptibility to MetS, independent of these factors, was associated with homozygosity for rs10482605*C (OR = 4.74, pcorr = 0.024) and the haplotype containing TTCGTTGATT (rs3806855*T_ rs3806854*T_rs10482605*C_rs10482614*G_rs6188*T_rs258813*T_rs33944801*G_rs34176759*A_rs17209258*T_rs6196*T, OR = 4.74, pcorr = 0.048), as well as for the CCT haplotype (rs41423247*C_ rs6877893*C_rs258763*T), OR = 6.02, pcorr = 0.030), but not to the differences in methylation or gene expression. Thus, NR3C1 polymorphisms seem to modulate the susceptibility to MetS in Mennonites, independently of lifestyle and early childhood events, and their role seems to be unrelated to DNA methylation and gene expression.
Collapse
Affiliation(s)
- Kathleen Liedtke Kolb
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
- Postgraduate Program in Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil
| | - Ana Luiza Sprotte Mira
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
- Postgraduate Program in Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil
| | - Eduardo Delabio Auer
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
- Postgraduate Program in Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil
| | - Isabela Dall’Oglio Bucco
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
| | - Carla Eduarda de Lima e Silva
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
| | - Priscila Ianzen dos Santos
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
- Postgraduate Program in Internal Medicine, Medical Clinic Department, UFPR, Rua General Carneiro, 181, 11th Floor, Alto da Glória, Curitiba 80210-170, PR, Brazil
| | - Valéria Bumiller-Bini Hoch
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
| | - Luana Caroline Oliveira
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
| | - Aline Borsato Hauser
- Laboratory School of Clinical Analysis, Department of Pharmacy, Federal University of Paraná (UFPR), Av. Pref. Lothário Meissner, 632, Jardim Botânico, Curitiba 80210-170, PR, Brazil;
| | - Jennifer Elisabeth Hundt
- Lübeck Institute of Experimental Dermatology, University of Lübeck, Ratzeburger Allee, 160, Haus 32, 23562 Lübeck, Germany;
| | - Alan R. Shuldiner
- Regeneron Genetics Center, 777 Old Saw Mill River Road, Tarrytown, NY 10591, USA;
| | - Fabiana Leão Lopes
- Human Genetics Branch, National Institute of Mental Health, 35 Convent Drive, Bethesda, MD 20892, USA;
- Institute of Psychiatry, Federal University Rio de Janeiro, Av. Venceslau Brás, 71, Rio de Janeiro 22290-140, RJ, Brazil
| | - Teide-Jens Boysen
- Institute of Clinical Molecular Biology (IKMB), Christian-Albrechts-University of Kiel, 24105 Kiel, Germany; (T.-J.B.); (A.F.)
| | - Andre Franke
- Institute of Clinical Molecular Biology (IKMB), Christian-Albrechts-University of Kiel, 24105 Kiel, Germany; (T.-J.B.); (A.F.)
| | - Luis Felipe Ribeiro Pinto
- Brazilian National Cancer Institute, Rua André Cavalcanti, 37, Rio de Janeiro 20231-050, RJ, Brazil; (L.F.R.P.); (S.C.S.-L.)
| | - Sheila Coelho Soares-Lima
- Brazilian National Cancer Institute, Rua André Cavalcanti, 37, Rio de Janeiro 20231-050, RJ, Brazil; (L.F.R.P.); (S.C.S.-L.)
| | - Gabriela Canalli Kretzschmar
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
- Postgraduate Program in Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil
- Faculdades Pequeno Príncipe, Av. Iguaçu, 333, Curitiba 80230-020, PR, Brazil
- Instituto de Pesquisa Pelé Pequeno Príncipe, Av. Silva Jardim, 1632, Curitiba 80250-060, PR, Brazil
| | - Angelica Beate Winter Boldt
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil; (K.L.K.); (A.L.S.M.); (E.D.A.); (I.D.B.); (C.E.d.L.e.S.); (P.I.d.S.); (V.B.-B.H.); (L.C.O.); (G.C.K.)
- Postgraduate Program in Genetics, Department of Genetics, Federal University of Paraná (UFPR), Centro Politécnico, Jardim das Américas, Curitiba 81531-990, PR, Brazil
| |
Collapse
|
4
|
Barbosa CFC, Asunto JC, Koh RBL, Santos DMC, Zhang D, Cao EP, Galvez LC. Genome-Wide SNP and Indel Discovery in Abaca ( Musa textilis Née) and among Other Musa spp. for Abaca Genetic Resources Management. Curr Issues Mol Biol 2023; 45:5776-5797. [PMID: 37504281 PMCID: PMC10377871 DOI: 10.3390/cimb45070365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/29/2023] Open
Abstract
Abaca (Musa textilis Née) is an economically important fiber crop in the Philippines. Its economic potential, however, is hampered by biotic and abiotic stresses, which are exacerbated by insufficient genomic resources for varietal identification vital for crop improvement. To address these gaps, this study aimed to discover genome-wide polymorphisms among abaca cultivars and other Musa species and analyze their potential as genetic marker resources. This was achieved through whole-genome Illumina resequencing of abaca cultivars and variant calling using BCFtools, followed by genetic diversity and phylogenetic analyses. A total of 20,590,381 high-quality single-nucleotide polymorphisms (SNP) and DNA insertions/deletions (InDels) were mined across 16 abaca cultivars. Filtering based on linkage disequilibrium (LD) yielded 130,768 SNPs and 13,620 InDels, accounting for 0.396 ± 0.106 and 0.431 ± 0.111 of gene diversity across these cultivars. LD-pruned polymorphisms across abaca, M. troglodytarum, M. acuminata and M. balbisiana enabled genetic differentiation within abaca and across the four Musa spp. Phylogenetic analysis revealed the registered varieties Abuab and Inosa to accumulate a significant number of mutations, eliciting further studies linking mutations to their advantageous phenotypes. Overall, this study pioneered in producing marker resources in abaca based on genome-wide polymorphisms vital for varietal authentication and comparative genotyping with the more studied Musa spp.
Collapse
Affiliation(s)
- Cris Francis C Barbosa
- Philippine Fiber Industry Development Authority (PhilFIDA), PCAF Building, Department of Agriculture (DA) Compound, Quezon City 1101, Philippines
- Institute of Biology, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Jayson C Asunto
- Philippine Fiber Industry Development Authority (PhilFIDA), PCAF Building, Department of Agriculture (DA) Compound, Quezon City 1101, Philippines
| | - Rhosener Bhea L Koh
- National Institute of Molecular Biology and Biotechnology, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Daisy May C Santos
- Institute of Biology, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Dapeng Zhang
- Sustainable Perennial Crops Laboratory, United States Department of Agriculture-Agricultural Research Service, Beltsville, MD 20705, USA
| | - Ernelea P Cao
- Institute of Biology, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Leny C Galvez
- Philippine Fiber Industry Development Authority (PhilFIDA), PCAF Building, Department of Agriculture (DA) Compound, Quezon City 1101, Philippines
| |
Collapse
|
5
|
Atkinson EG, Artomov M, Loboda AA, Rehm HL, MacArthur DG, Karczewski KJ, Neale BM, Daly MJ. Discordant calls across genotype discovery approaches elucidate variants with systematic errors. Genome Res 2023; 33:999-1005. [PMID: 37253541 PMCID: PMC10519400 DOI: 10.1101/gr.277908.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 05/19/2023] [Indexed: 06/01/2023]
Abstract
Large-scale high-throughput sequencing data sets have been transformative for informing clinical variant interpretation and for use as reference panels for statistical and population genetic efforts. Although such resources are often treated as ground truth, we find that in widely used reference data sets such as the Genome Aggregation Database (gnomAD), some variants pass gold-standard filters, yet are systematically different in their genotype calls across genotype discovery approaches. The inclusion of such discordant sites in study designs involving multiple genotype discovery strategies could bias results and lead to false-positive hits in association studies owing to technological artifacts rather than a true relationship to the phenotype. Here, we describe this phenomenon of discordant genotype calls across genotype discovery approaches, characterize the error mode of wrong calls, provide a list of discordant sites identified in gnomAD that should be treated with caution in analyses, and present a metric and machine learning classifier trained on gnomAD data to identify likely discordant variants in other data sets. We find that different genotype discovery approaches have different sets of variants at which this problem occurs, but there are characteristic variant features that can be used to predict discordant behavior. Discordant sites are largely shared across ancestry groups, although different populations are powered for the discovery of different variants. We find that the most common error mode is that of a variant being heterozygous for one approach and homozygous for the other, with heterozygous in the genomes and homozygous reference in the exomes making up the majority of miscalls.
Collapse
Affiliation(s)
- Elizabeth G Atkinson
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Mykyta Artomov
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, Ohio 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio 43210, USA
| | - Alexander A Loboda
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- ITMO University, Saint-Petersburg, 197101, Russia
- Almazov National Medical Research Center, St. Petersburg, 197341, Russia
| | - Heidi L Rehm
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Daniel G MacArthur
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Darlinghurst, New South Wales 2010, Australia
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Institute for Molecular Medicine Finland, University of Helsinki, FI-00290 Helsinki, Finland
| |
Collapse
|
6
|
Vaisband M, Schubert M, Gassner FJ, Geisberger R, Greil R, Zaborsky N, Hasenauer J. Validation of genetic variants from NGS data using deep convolutional neural networks. BMC Bioinformatics 2023; 24:158. [PMID: 37081386 PMCID: PMC10116675 DOI: 10.1186/s12859-023-05255-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open
Abstract
Accurate somatic variant calling from next-generation sequencing data is one most important tasks in personalised cancer therapy. The sophistication of the available technologies is ever-increasing, yet, manual candidate refinement is still a necessary step in state-of-the-art processing pipelines. This limits reproducibility and introduces a bottleneck with respect to scalability. We demonstrate that the validation of genetic variants can be improved using a machine learning approach resting on a Convolutional Neural Network, trained using existing human annotation. In contrast to existing approaches, we introduce a way in which contextual data from sequencing tracks can be included into the automated assessment. A rigorous evaluation shows that the resulting model is robust and performs on par with trained researchers following published standard operating procedure.
Collapse
Affiliation(s)
- Marc Vaisband
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria.
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany.
| | - Maria Schubert
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Franz Josef Gassner
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Roland Geisberger
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Richard Greil
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Nadja Zaborsky
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| |
Collapse
|
7
|
FVC as an adaptive and accurate method for filtering variants from popular NGS analysis pipelines. Commun Biol 2022; 5:975. [PMID: 36114280 PMCID: PMC9481582 DOI: 10.1038/s42003-022-03397-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 04/22/2022] [Indexed: 11/08/2022] Open
Abstract
The quality control of variants from whole-genome sequencing data is vital in clinical diagnosis and human genetics research. However, current filtering methods (Frequency, Hard-Filter, VQSR, GARFIELD, and VEF) were developed to be utilized on particular variant callers and have certain limitations. Especially, the number of eliminated true variants far exceeds the number of removed false variants using these methods. Here, we present an adaptive method for quality control on genetic variants from different analysis pipelines, and validate it on the variants generated from four popular variant callers (GATK HaplotypeCaller, Mutect2, Varscan2, and DeepVariant). FVC consistently exhibited the best performance. It removed far more false variants than the current state-of-the-art filtering methods and recalled ~51-99% true variants filtered out by the other methods. Once trained, FVC can be conveniently integrated into a user-specific variant calling pipeline. FVC is a method for calling specific gene variants from whole genome data, for potential use in clinical diagnosis and human genetics research.
Collapse
|
8
|
Fernández-Orth D, Rueda M, Singh B, Moldes M, Jene A, Ferri M, Vasallo C, Fromont LA, Navarro A, Rambla J. A quality control portal for sequencing data deposited at the European genome-phenome archive. Brief Bioinform 2022; 23:6570012. [PMID: 35438138 PMCID: PMC9116225 DOI: 10.1093/bib/bbac136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 03/01/2022] [Accepted: 03/23/2022] [Indexed: 11/15/2022] Open
Abstract
Since its launch in 2008, the European Genome-Phenome Archive (EGA) has been leading the archiving and distribution of human identifiable genomic data. In this regard, one of the community concerns is the potential usability of the stored data, as of now, data submitters are not mandated to perform any quality control (QC) before uploading their data and associated metadata information. Here, we present a new File QC Portal developed at EGA, along with QC reports performed and created for 1 694 442 files [Fastq, sequence alignment map (SAM)/binary alignment map (BAM)/CRAM and variant call format (VCF)] submitted at EGA. QC reports allow anonymous EGA users to view summary-level information regarding the files within a specific dataset, such as quality of reads, alignment quality, number and type of variants and other features. Researchers benefit from being able to assess the quality of data prior to the data access decision and thereby, increasing the reusability of data (https://ega-archive.org/blog/data-upcycling-powered-by-ega/).
Collapse
Affiliation(s)
- Dietmar Fernández-Orth
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Manuel Rueda
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Babita Singh
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Mauricio Moldes
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Aina Jene
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Marta Ferri
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Claudia Vasallo
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Lauren A Fromont
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Arcadi Navarro
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| | - Jordi Rambla
- European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain
| |
Collapse
|
9
|
Artificial Intelligence and Cardiovascular Genetics. Life (Basel) 2022; 12:life12020279. [PMID: 35207566 PMCID: PMC8875522 DOI: 10.3390/life12020279] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/26/2022] [Accepted: 02/09/2022] [Indexed: 12/13/2022] Open
Abstract
Polygenic diseases, which are genetic disorders caused by the combined action of multiple genes, pose unique and significant challenges for the diagnosis and management of affected patients. A major goal of cardiovascular medicine has been to understand how genetic variation leads to the clinical heterogeneity seen in polygenic cardiovascular diseases (CVDs). Recent advances and emerging technologies in artificial intelligence (AI), coupled with the ever-increasing availability of next generation sequencing (NGS) technologies, now provide researchers with unprecedented possibilities for dynamic and complex biological genomic analyses. Combining these technologies may lead to a deeper understanding of heterogeneous polygenic CVDs, better prognostic guidance, and, ultimately, greater personalized medicine. Advances will likely be achieved through increasingly frequent and robust genomic characterization of patients, as well the integration of genomic data with other clinical data, such as cardiac imaging, coronary angiography, and clinical biomarkers. This review discusses the current opportunities and limitations of genomics; provides a brief overview of AI; and identifies the current applications, limitations, and future directions of AI in genomics.
Collapse
|
10
|
Musolf AM, Holzinger ER, Malley JD, Bailey-Wilson JE. What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics. Hum Genet 2021; 141:1515-1528. [PMID: 34862561 PMCID: PMC9360120 DOI: 10.1007/s00439-021-02402-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 11/08/2021] [Indexed: 01/26/2023]
Abstract
Genetic data have become increasingly complex within the past decade, leading researchers to pursue increasingly complex questions, such as those involving epistatic interactions and protein prediction. Traditional methods are ill-suited to answer these questions, but machine learning (ML) techniques offer an alternative solution. ML algorithms are commonly used in genetics to predict or classify subjects, but some methods evaluate which features (variables) are responsible for creating a good prediction; this is called feature importance. This is critical in genetics, as researchers are often interested in which features (e.g., SNP genotype or environmental exposure) are responsible for a good prediction. This allows for the deeper analysis beyond simple prediction, including the determination of risk factors associated with a given phenotype. Feature importance further permits the researcher to peer inside the black box of many ML algorithms to see how they work and which features are critical in informing a good prediction. This review focuses on ML methods that provide feature importance metrics for the analysis of genetic data. Five major categories of ML algorithms: k nearest neighbors, artificial neural networks, deep learning, support vector machines, and random forests are described. The review ends with a discussion of how to choose the best machine for a data set. This review will be particularly useful for genetic researchers looking to use ML methods to answer questions beyond basic prediction and classification.
Collapse
Affiliation(s)
- Anthony M Musolf
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Emily R Holzinger
- Target Sciences, Informatics and Predictive Sciences, Bristol Myers Squibb, Cambridge, MA, USA
| | - James D Malley
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Joan E Bailey-Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA.
| |
Collapse
|
11
|
Machine learning random forest for predicting oncosomatic variant NGS analysis. Sci Rep 2021; 11:21820. [PMID: 34750410 PMCID: PMC8575902 DOI: 10.1038/s41598-021-01253-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 10/21/2021] [Indexed: 12/02/2022] Open
Abstract
Since 2017, we have used IonTorrent NGS platform in our hospital to diagnose and treat cancer. Analyzing variants at each run requires considerable time, and we are still struggling with some variants that appear correct on the metrics at first, but are found to be negative upon further investigation. Can any machine learning algorithm (ML) help us classify NGS variants? This has led us to investigate which ML can fit our NGS data and to develop a tool that can be routinely implemented to help biologists. Currently, one of the greatest challenges in medicine is processing a significant quantity of data. This is particularly true in molecular biology with the advantage of next-generation sequencing (NGS) for profiling and identifying molecular tumors and their treatment. In addition to bioinformatics pipelines, artificial intelligence (AI) can be valuable in helping to analyze mutation variants. Generating sequencing data from patient DNA samples has become easy to perform in clinical trials. However, analyzing the massive quantities of genomic or transcriptomic data and extracting the key biomarkers associated with a clinical response to a specific therapy requires a formidable combination of scientific expertise, biomolecular skills and a panel of bioinformatic and biostatistic tools, in which artificial intelligence is now successful in developing future routine diagnostics. However, cancer genome complexity and technical artifacts make identifying real variants challenging. We present a machine learning method for classifying pathogenic single nucleotide variants (SNVs), single nucleotide polymorphisms (SNPs), multiple nucleotide variants (MNVs), insertions, and deletions detected by NGS from different types of tumor specimens, such as: colorectal, melanoma, lung and glioma cancer. We compared our NGS data to different machine learning algorithms using the k-fold cross-validation method and to neural networks (deep learning) to measure the performance of the different ML algorithms and determine which one is a valid model for confirming NGS variant calls in cancer diagnosis. We trained our machine learning with 70% of our data samples, extracted from our local database (our data structure had 7 parameters: chromosome, position, exon, variant allele frequency, minor allele frequency, coverage and protein description) and validated it with the 30% remaining data. The model offering the best accuracy was chosen and implemented in the NGS analysis routine. Artificial intelligence was developed with the R script language version 3.6.0. We trained our model on 70% of 102,011 variants. Our best error rate (0.22%) was found with random forest machine learning (ntree = 500 and mtry = 4), with an AUC of 0.99. Neural networks achieved some good scores. The final trained model with the neural network achieved an accuracy of 98% and an ROC-AUC of 0.99 with validation data. We tested our RF model to interpret more than 2000 variants from our NGS database: 20 variants were misclassified (error rate < 1%). The errors were nomenclature problems and false positives. After adding false positives to our training database and implementing our RF model routinely, our error rate was always < 0.5%. The RF model shows excellent results for oncosomatic NGS interpretation and can easily be implemented in other molecular biology laboratories. AI is becoming increasingly important in molecular biomedical analysis and can be very helpful in processing medical data. Neural networks show a good capacity in variant classification, and in the future, they may be useful in predicting more complex variants.
Collapse
|
12
|
Survey of artificial intelligence approaches in the study of anthropogenic impacts on symbiotic organisms – a holistic view. Symbiosis 2021. [DOI: 10.1007/s13199-021-00778-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Suehiro Y, Yoshina S, Motohashi T, Iwata S, Dejima K, Mitani S. Efficient collection of a large number of mutations by mutagenesis of DNA damage response defective animals. Sci Rep 2021; 11:7630. [PMID: 33828169 PMCID: PMC8027614 DOI: 10.1038/s41598-021-87226-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 03/24/2021] [Indexed: 02/01/2023] Open
Abstract
With the development of massive parallel sequencing technology, it has become easier to establish new model organisms that are ideally suited to the specific biological phenomena of interest. Considering the history of research using classical model organisms, we believe that the efficient construction and sharing of gene mutation libraries will facilitate the progress of studies using these new model organisms. Using C. elegans, we applied the TMP/UV mutagenesis method to animals lacking function in the DNA damage response genes atm-1 and xpc-1. This method produces genetic mutations three times more efficiently than mutagenesis of wild-type animals. Furthermore, we confirmed that the use of next-generation sequencing and the elimination of false positives through machine learning could automate the process of mutation identification with an accuracy of over 95%. Eventually, we sequenced the whole genomes of 488 strains and isolated 981 novel mutations generated by the present method; these strains have been made available to anyone who wants to use them. Since the targeted DNA damage response genes are well conserved and the mutagens used in this study are also effective in a variety of species, we believe that our method is generally applicable to a wide range of animal species.
Collapse
Affiliation(s)
- Yuji Suehiro
- Department of Physiology, Tokyo Women's Medical University, Shinjuku, Tokyo, Japan
| | - Sawako Yoshina
- Department of Physiology, Tokyo Women's Medical University, Shinjuku, Tokyo, Japan
| | - Tomoko Motohashi
- Department of Physiology, Tokyo Women's Medical University, Shinjuku, Tokyo, Japan
| | - Satoru Iwata
- Chubu University Center for Education in Laboratory Animal Research, Kasugai, Aichi, Japan
| | - Katsufumi Dejima
- Department of Physiology, Tokyo Women's Medical University, Shinjuku, Tokyo, Japan
| | - Shohei Mitani
- Department of Physiology, Tokyo Women's Medical University, Shinjuku, Tokyo, Japan.
- Tokyo Women's Medical University Institute for Integrated Medical Sciences, Shinjuku, Tokyo, Japan.
| |
Collapse
|
14
|
Albrecht S, Sprang M, Andrade-Navarro MA, Fontaine JF. seqQscorer: automated quality control of next-generation sequencing data using machine learning. Genome Biol 2021; 22:75. [PMID: 33673854 PMCID: PMC7934511 DOI: 10.1186/s13059-021-02294-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 02/10/2021] [Indexed: 01/03/2023] Open
Abstract
Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer.
Collapse
Affiliation(s)
- Steffen Albrecht
- Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Maximilian Sprang
- Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Jean-Fred Fontaine
- Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| |
Collapse
|
15
|
Veeramachaneni V. Data Analysis in Rare Disease Diagnostics. J Indian Inst Sci 2020. [DOI: 10.1007/s41745-020-00189-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|