2
|
Hoffmann M, Poschenrieder JM, Incudini M, Baier S, Fitz A, Maier A, Hartung M, Hoffmann C, Trummer N, Adamowicz K, Picciani M, Scheibling E, Harl MV, Lesch I, Frey H, Kayser S, Wissenberg P, Schwartz L, Hafner L, Acharya A, Hackl L, Grabert G, Lee SG, Cho G, Cloward M, Jankowski J, Lee HK, Tsoy O, Wenke N, Pedersen AG, Bønnelykke K, Mandarino A, Melograna F, Schulz L, Climente-González H, Wilhelm M, Iapichino L, Wienbrandt L, Ellinghaus D, Van Steen K, Grossi M, Furth PA, Hennighausen L, Di Pierro A, Baumbach J, Kacprowski T, List M, Blumenthal DB. Network medicine-based epistasis detection in complex diseases: ready for quantum computing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.07.23298205. [PMID: 38076997 PMCID: PMC10705612 DOI: 10.1101/2023.11.07.23298205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.
Collapse
Affiliation(s)
- Markus Hoffmann
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Julian M. Poschenrieder
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Massimiliano Incudini
- Dipartimento di Informatica, Universit’a di Verona, Strada le Grazie 15 - 34137, Verona, Italy
| | - Sylvie Baier
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Amelie Fitz
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Christian Hoffmann
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Nico Trummer
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Evelyn Scheibling
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Maximilian V. Harl
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Ingmar Lesch
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Hunor Frey
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Simon Kayser
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Paul Wissenberg
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Leon Schwartz
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Leon Hafner
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
| | - Aakriti Acharya
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Lena Hackl
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Gordon Grabert
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Sung-Gwon Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, Korea
| | - Gyuhyeok Cho
- Department of Chemistry, Gwangju Institute of Science and Technology, Gwangju, Korea
| | - Matthew Cloward
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Jakub Jankowski
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Hye Kyung Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Nina Wenke
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Anders Gorm Pedersen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
| | - Klaus Bønnelykke
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Antonio Mandarino
- International Centre for Theory of Quantum Technologies, University of Gdańsk, 80-309 Gdańsk, Poland
| | - Federico Melograna
- BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Laura Schulz
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
| | | | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Luigi Iapichino
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
| | - Lars Wienbrandt
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
| | - Kristel Van Steen
- BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Michele Grossi
- European Organization for Nuclear Research (CERN), Geneva 1211, Switzerland
| | - Priscilla A. Furth
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
- Departments of Oncology & Medicine, Georgetown University, Washington, DC, USA
| | - Lothar Hennighausen
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Alessandra Di Pierro
- Dipartimento di Informatica, Universit’a di Verona, Strada le Grazie 15 - 34137, Verona, Italy
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - David B. Blumenthal
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
3
|
Zhao X, Wang X, Jin Z, Wang R. A normalized differential sequence feature encoding method based on amino acid sequences. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:14734-14755. [PMID: 37679156 DOI: 10.3934/mbe.2023659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Protein interactions are the foundation of all metabolic activities of cells, such as apoptosis, the immune response, and metabolic pathways. In order to optimize the performance of protein interaction prediction, a coding method based on normalized difference sequence characteristics (NDSF) of amino acid sequences is proposed. By using the positional relationships between amino acids in the sequences and the correlation characteristics between sequence pairs, NDSF is jointly encoded. Using principal component analysis (PCA) and local linear embedding (LLE) dimensionality reduction methods, the coded 174-dimensional human protein sequence vector is extracted using sequence features. This study compares the classification performance of four ensemble learning methods (AdaBoost, Extra trees, LightGBM, XGBoost) applied to PCA and LLE features. Cross-validation and grid search methods are used to find the best combination of parameters. The results show that the accuracy of NDSF is generally higher than that of the sequence matrix-based coding method (MOS) coding method, and the loss and coding time can be greatly reduced. The bar chart of feature extraction shows that the classification accuracy is significantly higher when using the linear dimensionality reduction method, PCA, compared to the nonlinear dimensionality reduction method, LLE. After classification with XGBoost, the model accuracy reaches 99.2%, which provides the best performance among all models. This study suggests that NDSF combined with PCA and XGBoost may be an effective strategy for classifying different human protein interactions.
Collapse
Affiliation(s)
- Xiaoman Zhao
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
- University of Science and Technology of China, Hefei 230026, Chin
| | - Xue Wang
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
| | - Zhou Jin
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
| | - Rujing Wang
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
- University of Science and Technology of China, Hefei 230026, Chin
| |
Collapse
|
5
|
Wang X, Cao X, Feng Y, Guo M, Yu G, Wang J. ELSSI: parallel SNP-SNP interactions detection by ensemble multi-type detectors. Brief Bioinform 2022; 23:6607749. [PMID: 35696639 DOI: 10.1093/bib/bbac213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 04/18/2022] [Accepted: 05/07/2022] [Indexed: 12/11/2022] Open
Abstract
With the development of high-throughput genotyping technology, single nucleotide polymorphism (SNP)-SNP interactions (SSIs) detection has become an essential way for understanding disease susceptibility. Various methods have been proposed to detect SSIs. However, given the disease complexity and bias of individual SSI detectors, these single-detector-based methods are generally unscalable for real genome-wide data and with unfavorable results. We propose a novel ensemble learning-based approach (ELSSI) that can significantly reduce the bias of individual detectors and their computational load. ELSSI randomly divides SNPs into different subsets and evaluates them by multi-type detectors in parallel. Particularly, ELSSI introduces a four-stage pipeline (generate, score, switch and filter) to iteratively generate new SNP combination subsets from SNP subsets, score the combination subset by individual detectors, switch high-score combinations to other detectors for re-scoring, then filter out combinations with low scores. This pipeline makes ELSSI able to detect high-order SSIs from large genome-wide datasets. Experimental results on various simulated and real genome-wide datasets show the superior efficacy of ELSSI to state-of-the-art methods in detecting SSIs, especially for high-order ones. ELSSI is applicable with moderate PCs on the Internet and flexible to assemble new detectors. The code of ELSSI is available at https://www.sdu-idea.cn/codes.php?name=ELSSI.
Collapse
Affiliation(s)
- Xin Wang
- School of Software, Shandong University, Jinan 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research(C-FAIR), Shandong University, Jinan 250101, China
| | - Xia Cao
- College of Computer and Information Sciences, Southwest University, Chongqing 400715, China
| | - Yuantao Feng
- College of Computer and Information Sciences, Southwest University, Chongqing 400715, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan 250101, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research(C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|
8
|
Wang X, Zhang H, Wang J, Yu G, Cui L, Guo M. EpiHNet: Detecting epistasis by heterogeneous molecule network. Methods 2021; 198:65-75. [PMID: 34555529 DOI: 10.1016/j.ymeth.2021.09.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/16/2021] [Accepted: 09/16/2021] [Indexed: 12/22/2022] Open
Abstract
Epistasis between single nucleotide polymorphisms (SNPs) plays an important role in elucidating the missing heritability of complex diseases. Diverse approaches have been invented for detecting SNP interactions, but they canonically neglect the important and useful connections between SNPs and other bio-molecules (i.e., miRNAs and lncRNAs). To comprehensively model these disease related molecules, a heterogeneous bio-molecular network based solution EpiHNet is introduced for high-order SNP interactions detection. EpiHNet firstly uses case/control data to construct an SNP statistical network, and meta-path based similarity on the heterogeneous network composed with SNPs, genes, lncRNAs, miRNAs and diseases to define another SNP relational network. The SNP relational network can explore and exploit different associations between molecules and diseases to complement the SNP statistical network and search the significantly associated SNPs. Next, EpiHNet integrates these two networks into a composite network, applies the modularity based clustering with fast search strategy to divide SNP nodes into different clusters. After that, it detects SNP interactions based on SNP combinations derived from each cluster. Synthetic experiments on diverse two-locus and three-locus disease models manifest that EpiHNet outperforms competitive baselines, even without the heterogeneous network. For real WTCCC breast cancer data, EpiHNet also demonstrates expressive results on detecting high-order SNP interactions.
Collapse
Affiliation(s)
- Xin Wang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Huiling Zhang
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| | - Jun Wang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Guoxian Yu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
| |
Collapse
|
9
|
Blumenthal DB, Baumbach J, Hoffmann M, Kacprowski T, List M. A framework for modeling epistatic interaction. Bioinformatics 2021; 37:1708-1716. [PMID: 33252645 DOI: 10.1093/bioinformatics/btaa990] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 10/21/2020] [Accepted: 11/16/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recently, various tools for detecting single nucleotide polymorphisms (SNPs) involved in epistasis have been developed. However, no studies evaluate the employed statistical epistasis models such as the χ2-test or quadratic regression independently of the tools that use them. Such an independent evaluation is crucial for developing improved epistasis detection tools, for it allows to decide if a tool's performance should be attributed to the epistasis model or to the optimization strategy run on top of it. RESULTS We present a protocol for evaluating epistasis models independently of the tools they are used in and generalize existing models designed for dichotomous phenotypes to the categorical and quantitative case. In addition, we propose a new model which scores candidate SNP sets by computing maximum likelihood distributions for the observed phenotypes in the cells of their penetrance tables. Extensive experiments show that the proposed maximum likelihood model outperforms three widely used epistasis models in most cases. The experiments also provide valuable insights into the properties of existing models, for instance, that quadratic regression perform particularly well on instances with quantitative phenotypes. AVAILABILITY AND IMPLEMENTATION The evaluation protocol and all compared models are implemented in C++ and are supported under Linux and macOS. They are available at https://github.com/baumbachlab/genepiseeker/, along with test datasets and scripts to reproduce the experiments. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David B Blumenthal
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Markus Hoffmann
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Tim Kacprowski
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
11
|
Blumenthal DB, Viola L, List M, Baumbach J, Tieri P, Kacprowski T. EpiGEN: an epistasis simulation pipeline. Bioinformatics 2020; 36:4957-4959. [DOI: 10.1093/bioinformatics/btaa245] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 04/03/2020] [Accepted: 04/08/2020] [Indexed: 02/06/2023] Open
Abstract
Abstract
Summary
Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes.
Availability and implementation
EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David B Blumenthal
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| | - Lorenzo Viola
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| | - Markus List
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| | - Jan Baumbach
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| | - Paolo Tieri
- CNR National Research Council, IAC Institute for Applied Computing, 00185 Rome, Italy
| | - Tim Kacprowski
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| |
Collapse
|