1
|
Ergun MA, Cinal O, Bakışlı B, Emül AA, Baysan M. COSAP: Comparative Sequencing Analysis Platform. BMC Bioinformatics 2024; 25:130. [PMID: 38532317 DOI: 10.1186/s12859-024-05756-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/20/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies. RESULTS Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure. CONCLUSIONS COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.
Collapse
Affiliation(s)
- Mehmet Arif Ergun
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Omer Cinal
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Berkant Bakışlı
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Abdullah Asım Emül
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey.
| |
Collapse
|
2
|
Wang Q, Jiang S, Li T, Qiu Z, Yan J, Fu R, Ma C, Wang X, Jiang S, Cheng Q. G2P Provides an Integrative Environment for Multi-model genomic selection analysis to improve genotype-to-phenotype prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1207139. [PMID: 37600179 PMCID: PMC10437076 DOI: 10.3389/fpls.2023.1207139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 07/21/2023] [Indexed: 08/22/2023]
Abstract
Genotype-to-phenotype (G2P) prediction has become a mainstream paradigm to facilitate genomic selection (GS)-assisted breeding in the seed industry. Many methods have been introduced for building GS models, but their prediction precision may vary depending on species and specific traits. Therefore, evaluation of multiple models and selection of the appropriate one is crucial to effective GS analysis. Here, we present the G2P container developed for the Singularity platform, which not only contains a library of 16 state-of-the-art GS models and 13 evaluation metrics. G2P works as an integrative environment offering comprehensive, unbiased evaluation analyses of the 16 GS models, which may be run in parallel on high-performance computing clusters. Based on the evaluation outcome, G2P performs auto-ensemble algorithms that not only can automatically select the most precise models but also can integrate prediction results from multiple models. This functionality should further improve the precision of G2P prediction. Another noteworthy function is the refinement design of the training set, in which G2P optimizes the training set based on the genetic diversity analysis of a studied population. Although the training samples in the optimized set are fewer than in the original set, the prediction precision is almost equivalent to that obtained when using the whole set. This functionality is quite useful in practice, as it reduces the cost of phenotyping when constructing training population. The G2P container and source codes are freely accessible at https://g2p-env.github.io/.
Collapse
Affiliation(s)
- Qian Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Shan Jiang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Tong Li
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Zhixu Qiu
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling, China
| | - Jun Yan
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Ran Fu
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Chuang Ma
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling, China
| | - Xiangfeng Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Shuqin Jiang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Qian Cheng
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
3
|
Walker K, Kalra D, Lowdon R, Chen G, Molik D, Soto DC, Dabbaghie F, Khleifat AA, Mahmoud M, Paulin LF, Raza MS, Pfeifer SP, Agustinho DP, Aliyev E, Avdeyev P, Barrozo ER, Behera S, Billingsley K, Chong LC, Choubey D, De Coster W, Fu Y, Gener AR, Hefferon T, Henke DM, Höps W, Illarionova A, Jochum MD, Jose M, Kesharwani RK, Kolora SRR, Kubica J, Lakra P, Lattimer D, Liew CS, Lo BW, Lo C, Lötter A, Majidian S, Mendem SK, Mondal R, Ohmiya H, Parvin N, Peralta C, Poon CL, Prabhakaran R, Saitou M, Sammi A, Sanio P, Sapoval N, Syed N, Treangen T, Wang G, Xu T, Yang J, Zhang S, Zhou W, Sedlazeck FJ, Busby B. The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms. F1000Res 2022; 11:530. [PMID: 36262335 PMCID: PMC9557141 DOI: 10.12688/f1000research.110194.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2022] [Indexed: 01/25/2023] Open
Abstract
In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.
Collapse
Affiliation(s)
- Kimberly Walker
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | | | - Guangyi Chen
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany,Center for Bioinformatics, Saarland University, Saarbrücken, Germany,
| | - David Molik
- Tropical Crop and Commodity Protection Research Unit, Pacific Basin Agricultural Research Center, Hilo, HI, 96720, USA
| | - Daniela C. Soto
- Biochemistry & Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, Davis, CA, 95616, USA
| | - Fawaz Dabbaghie
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany,Institute for Medical Biometry and Bioinformatics, University hospital Düsseldorf, Düsseldorf, Germany
| | - Ahmad Al Khleifat
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Muhammad Sohail Raza
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Beijing, China
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Daniel Paiva Agustinho
- Department of Molecular Microbiology, Washington University in St. Louis School of Medicine, St. Louis, MO, 63110, USA
| | - Elbay Aliyev
- Research Department, Sidra Medicine, Doha, Qatar
| | - Pavel Avdeyev
- Computational Biology Institute, The George Washington University, Washington, DC, 20052, USA
| | - Enrico R. Barrozo
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kimberley Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Li Chuin Chong
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, Turkey
| | - Deepak Choubey
- Department of Technology, Savitribai Phule Pune University, Pune, Maharashtra, India
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, Antwerp, Belgium,Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Alejandro R. Gener
- Association of Public Health Labs, Centers for Disease Control and Prevention, Downey, CA, USA
| | - Timothy Hefferon
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Morgan Henke
- Department Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Wolfram Höps
- EMBL Heidelberg, Genome Biology Unit, Heidelberg, Germany
| | | | - Michael D. Jochum
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Maria Jose
- Centre for Bioinformatics, Pondicherry University, Pondicherry, India
| | - Rupesh K. Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | | | - Priya Lakra
- Department of Zoology, University of Delhi, Delhi, India
| | - Damaris Lattimer
- University of Applied Sciences Upper Austria - FH Hagenberg, Mühlkreis, Austria
| | - Chia-Sin Liew
- Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, Nebraska, 68588, USA
| | - Bai-Wei Lo
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Chunhsuan Lo
- Human Genetics Laboratory, National Institute of Genetics, Japan, Mishima City, Japan
| | - Anneri Lötter
- Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | | | - Rajarshi Mondal
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | - Hiroko Ohmiya
- Genetic Reagent Development Unit, Medical & Biological Laboratories Co., Ltd., Tokoyo, Japan
| | - Nasrin Parvin
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | | | | | | | - Marie Saitou
- Center of Integrative Genetics (CIGENE),Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Aditi Sammi
- School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh, India
| | - Philippe Sanio
- University of Applied Sciences Upper Austria - FH Hagenberg, Hagenberg im Mühlkreis, Austria
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Najeeb Syed
- Research Department, Sidra Medicine, Doha, Qatar
| | - Todd Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Tiancheng Xu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Jianzhi Yang
- Department of Quantitative and Computational Biology,, University of Southern California, Los Angeles, CA, USA
| | - Shangzhe Zhang
- School of Biology, University of St Andrews, St Andrews, UK
| | - Weiyu Zhou
- Department of Statistical Science, George Mason University, Fairfax, Virginia, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | | |
Collapse
|
4
|
VPipe: an Automated Bioinformatics Platform for Assembly and Management of Viral Next-Generation Sequencing Data. Microbiol Spectr 2022; 10:e0256421. [PMID: 35234489 PMCID: PMC8941893 DOI: 10.1128/spectrum.02564-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Next-generation sequencing (NGS) is a powerful tool for detecting and investigating viral pathogens; however, analysis and management of the enormous amounts of data generated from these technologies remains a challenge. Here, we present VPipe (the Viral NGS Analysis Pipeline and Data Management System), an automated bioinformatics pipeline optimized for whole-genome assembly of viral sequences and identification of diverse species. VPipe automates the data quality control, assembly, and contig identification steps typically performed when analyzing NGS data. Users access the pipeline through a secure web-based portal, which provides an easy-to-use interface with advanced search capabilities for reviewing results. In addition, VPipe provides a centralized system for storing and analyzing NGS data, eliminating common bottlenecks in bioinformatics analyses for public health laboratories with limited on-site computational infrastructure. The performance of VPipe was validated through the analysis of publicly available NGS data sets for viral pathogens, generating high-quality assemblies for 12 data sets. VPipe also generated assemblies with greater contiguity than similar pipelines for 41 human respiratory syncytial virus isolates and 23 SARS-CoV-2 specimens. IMPORTANCE Computational infrastructure and bioinformatics analysis are bottlenecks in the application of NGS to viral pathogens. As of September 2021, VPipe has been used by the U.S. Centers for Disease Control and Prevention (CDC) and 12 state public health laboratories to characterize >17,500 and 1,500 clinical specimens and isolates, respectively. VPipe automates genome assembly for a wide range of viruses, including high-consequence pathogens such as SARS-CoV-2. Such automated functionality expedites public health responses to viral outbreaks and pathogen surveillance.
Collapse
|
5
|
Beltrán-Corbellini Á, Aledo-Serrano Á, Møller RS, Pérez-Palma E, García-Morales I, Toledano R, Gil-Nagel A. Epilepsy Genetics and Precision Medicine in Adults: A New Landscape for Developmental and Epileptic Encephalopathies. Front Neurol 2022; 13:777115. [PMID: 35250806 PMCID: PMC8891166 DOI: 10.3389/fneur.2022.777115] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 01/27/2022] [Indexed: 12/14/2022] Open
Abstract
This review aims to provide an updated perspective of epilepsy genetics and precision medicine in adult patients, with special focus on developmental and epileptic encephalopathies (DEEs), covering relevant and controversial issues, such as defining candidates for genetic testing, which genetic tests to request and how to interpret them. A literature review was conducted, including findings in the discussion and recommendations. DEEs are wide and phenotypically heterogeneous electroclinical syndromes. They generally have a pediatric presentation, but patients frequently reach adulthood still undiagnosed. Identifying the etiology is essential, because there lies the key for precision medicine. Phenotypes modify according to age, and although deep phenotyping has allowed to outline certain entities, genotype-phenotype correlations are still poor, commonly leading to long-lasting diagnostic odysseys and ineffective therapies. Recent adult series show that the target patients to be identified for genetic testing are those with epilepsy and different risk factors. The clinician should take active part in the assessment of the pathogenicity of the variants detected, especially concerning variants of uncertain significance. An accurate diagnosis implies precision medicine, meaning genetic counseling, prognosis, possible future therapies, and a reduction of iatrogeny. Up to date, there are a few tens of gene mutations with additional concrete treatments, including those with restrictive/substitutive therapies, those with therapies modifying signaling pathways, and channelopathies, that are worth to be assessed in adults. Further research is needed regarding phenotyping of adult syndromes, early diagnosis, and the development of targeted therapies.
Collapse
Affiliation(s)
| | - Ángel Aledo-Serrano
- Epilepsy Program, Neurology Department, Hospital Ruber Internacional, Madrid, Spain
- *Correspondence: Ángel Aledo-Serrano
| | - Rikke S. Møller
- Department of Epilepsy Genetics and Personalized Treatment, The Danish Epilepsy Centre, Dianalund, Denmark
| | - Eduardo Pérez-Palma
- Universidad del Desarrollo, Centro de Genética y Genómica, Facultad de Medicina Clínica Alemana, Santiago, Chile
| | - Irene García-Morales
- Epilepsy Program, Neurology Department, Hospital Ruber Internacional, Madrid, Spain
- Epilepsy Unit, Neurology Department, Clínico San Carlos University Hospital, Madrid, Spain
| | - Rafael Toledano
- Epilepsy Program, Neurology Department, Hospital Ruber Internacional, Madrid, Spain
- Epilepsy Unit, Neurology Department, Ramón y Cajal University Hospital, Madrid, Spain
| | - Antonio Gil-Nagel
- Epilepsy Program, Neurology Department, Hospital Ruber Internacional, Madrid, Spain
| |
Collapse
|
6
|
Bowles H, Kabiljo R, Al Khleifat A, Jones A, Quinn JP, Dobson RJB, Swanson CM, Al-Chalabi A, Iacoangeli A. An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data. FRONTIERS IN BIOINFORMATICS 2022; 2:1062328. [PMID: 36845320 PMCID: PMC9945273 DOI: 10.3389/fbinf.2022.1062328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 12/12/2022] [Indexed: 02/10/2023] Open
Abstract
There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.
Collapse
Affiliation(s)
- Harry Bowles
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Renata Kabiljo
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Ashley Jones
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - John P. Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Richard J. B. Dobson
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- NIHR Biomedical Research Centre, University College London Hospitals NHS Foundation Trust, London, United Kingdom
| | - Chad M. Swanson
- Department of Infectious Diseases, School of Immunology and Microbial Sciences, King’s College London, London, United Kingdom
| | - Ammar Al-Chalabi
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Neurology, King’s College Hospital, London, United Kingdom
| | - Alfredo Iacoangeli
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London, London, United Kingdom
- *Correspondence: Alfredo Iacoangeli,
| |
Collapse
|
7
|
Hu J, Lepore R, Dobson RJB, Al-Chalabi A, M. Bean D, Iacoangeli A. DGLinker: flexible knowledge-graph prediction of disease-gene associations. Nucleic Acids Res 2021; 49:W153-W161. [PMID: 34125897 PMCID: PMC8262728 DOI: 10.1093/nar/gkab449] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 04/30/2021] [Accepted: 05/17/2021] [Indexed: 11/14/2022] Open
Abstract
As a result of the advent of high-throughput technologies, there has been rapid progress in our understanding of the genetics underlying biological processes. However, despite such advances, the genetic landscape of human diseases has only marginally been disclosed. Exploiting the present availability of large amounts of biological and phenotypic data, we can use our current understanding of disease genetics to train machine learning models to predict novel genetic factors associated with the disease. To this end, we developed DGLinker, a webserver for the prediction of novel candidate genes for human diseases given a set of known disease genes. DGLinker has a user-friendly interface that allows non-expert users to exploit biomedical information from a wide range of biological and phenotypic databases, and/or to upload their own data, to generate a knowledge-graph and use machine learning to predict new disease-associated genes. The webserver includes tools to explore and interpret the results and generates publication-ready figures. DGLinker is available at https://dglinker.rosalind.kcl.ac.uk. The webserver is free and open to all users without the need for registration.
Collapse
Affiliation(s)
- Jiajing Hu
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King's College London, SE5 8AF, London, UK
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, SE5 9RT, UK
| | - Rosalba Lepore
- BSC-CNS Barcelona Supercomputing Center, Barcelona, 08034, Spain
| | - Richard J B Dobson
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King's College London, SE5 8AF, London, UK
- Health Data Research UK London, University College London, London, WC1E 6BT, UK
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| | - Ammar Al-Chalabi
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, SE5 9RT, UK
- King′s College Hospital, Bessemer Road, Denmark Hill, London, SE5 9RS, UK
| | - Daniel M. Bean
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King's College London, SE5 8AF, London, UK
- Health Data Research UK London, University College London, London, WC1E 6BT, UK
| | - Alfredo Iacoangeli
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King's College London, SE5 8AF, London, UK
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, SE5 9RT, UK
- National Institute for Health Research Biomedical Research Centre and Dementia Unit at South London and Maudsley NHS Foundation Trust and King's College London, London, SE5 8AF, UK
| |
Collapse
|
8
|
Bean DM, Al-Chalabi A, Dobson RJB, Iacoangeli A. A Knowledge-Based Machine Learning Approach to Gene Prioritisation in Amyotrophic Lateral Sclerosis. Genes (Basel) 2020; 11:E668. [PMID: 32575372 PMCID: PMC7349022 DOI: 10.3390/genes11060668] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 06/13/2020] [Accepted: 06/16/2020] [Indexed: 02/07/2023] Open
Abstract
Amyotrophic lateral sclerosis is a neurodegenerative disease of the upper and lower motor neurons resulting in death from neuromuscular respiratory failure, typically within two to five years of first symptoms. Several rare disruptive gene variants have been associated with ALS and are responsible for about 15% of all cases. Although our knowledge of the genetic landscape of this disease is improving, it remains limited. Machine learning models trained on the available protein-protein interaction and phenotype-genotype association data can use our current knowledge of the disease genetics for the prediction of novel candidate genes. Here, we describe a knowledge-based machine learning method for this purpose. We trained our model on protein-protein interaction data from IntAct, gene function annotation from Gene Ontology, and known disease-gene associations from DisGeNet. Using several sets of known ALS genes from public databases and a manual review as input, we generated a list of new candidate genes for each input set. We investigated the relevance of the predicted genes in ALS by using the available summary statistics from the largest ALS genome-wide association study and by performing functional and phenotype enrichment analysis. The predicted sets were enriched for genes associated with other neurodegenerative diseases known to overlap with ALS genetically and phenotypically, as well as for biological processes associated with the disease. Moreover, using ALS genes from ClinVar and our manual review as input, the predicted sets were enriched for ALS-associated genes (ClinVar p = 0.038 and manual review p = 0.060) when used for gene prioritisation in a genome-wide association study.
Collapse
Affiliation(s)
- Daniel M. Bean
- Department of Biostatistics & Health Informatics, King′s College London, 16 De Crespigny Park, London SE5 8AF, UK;
- Health Data Research UK London, University College London, 16 De Crespigny Park, London SE5 8AF, UK
| | - Ammar Al-Chalabi
- King′s College Hospital, Bessemer Road, Denmark Hill, Brixton, London SE5 9RS, UK;
- Maurice Wohl Clinical Neuroscience Institute, Department of Basic and Clinical Neuroscience, King′s College London, London, 5 Cutcombe Rd, Brixton, London SE5 9RT, UK
| | - Richard J. B. Dobson
- Department of Biostatistics & Health Informatics, King′s College London, 16 De Crespigny Park, London SE5 8AF, UK;
- Health Data Research UK London, University College London, 16 De Crespigny Park, London SE5 8AF, UK
- Institute of Health Informatics, University College London, 222 Euston Rd, London NW1 2DA, UK
| | - Alfredo Iacoangeli
- Department of Biostatistics & Health Informatics, King′s College London, 16 De Crespigny Park, London SE5 8AF, UK;
- Maurice Wohl Clinical Neuroscience Institute, Department of Basic and Clinical Neuroscience, King′s College London, London, 5 Cutcombe Rd, Brixton, London SE5 9RT, UK
| |
Collapse
|