1
|
Wei X, Dong S, Su Z, Tang L, Zhao P, Pan C, Wang F, Tang Y, Zhang W, Zhang X. NetMoST: A network-based machine learning approach for subtyping schizophrenia using polygenic SNP allele biomarkers. ARXIV 2023:arXiv:2302.00104v2. [PMID: 36776814 PMCID: PMC9915719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
Subtyping neuropsychiatric disorders like schizophrenia is essential for improving the diagnosis and treatment of complex diseases. Subtyping schizophrenia is challenging because it is polygenic and genetically heterogeneous, rendering the standard symptom-based diagnosis often unreliable and unrepeatable. We developed a novel network-based machine-learning approach, netMoST, to subtyping psychiatric disorders. NetMoST identifies polygenic risk SNP-allele modules from genome-wide genotyping data as polygenic haplotype biomarkers (PHBs) for disease subtyping. We applied netMoST to subtype a cohort of schizophrenia subjects into three distinct biotypes with differentiable genetic, neuroimaging and functional characteristics. The PHBs of the first biotype (36.9% of all patients) were related to neurodevelopment and cognition, the PHBs of the second biotype (28.4%) were enriched for neuroimmune functions, and the PHBs of the third biotype (34.7%) were associated with the transport of calcium ions and neurotransmitters. Neuroimaging patterns provided additional support to the new biotypes, with unique regional homogeneity (ReHo) patterns observed in the brains of each biotype compared with healthy controls. Our findings demonstrated netMoST's capability for uncovering novel biotypes of complex diseases such as schizophrenia. The results also showed the power of exploring polygenic allelic patterns that transcend the conventional GWAS approaches.
Collapse
Affiliation(s)
- Xinru Wei
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, Jiangsu 210029, China
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu 210001, China
| | - Shuai Dong
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu 210001, China
| | - Zhao Su
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu 210001, China
| | - Lili Tang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, Jiangsu 210029, China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, China
| | - Pengfei Zhao
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, Jiangsu 210029, China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, China
| | - Chunyu Pan
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Fei Wang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, Jiangsu 210029, China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, China
| | - Yanqing Tang
- Department of Psychiatry, The First Affiliated Hospital of China Medical University, Shenyang, China
- Brain Function Research Section, The First Affiliated Hospital of China Medical University, Shenyang, China
- Department of Gerontology, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Weixiong Zhang
- Department of Health Technology and Informatics, Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| | - Xizhe Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu 210001, China
| |
Collapse
|
2
|
Abstract
Network modeling transforms data into a structure of nodes and edges such that edges represent relationships between pairs of objects, then extracts clusters of densely connected nodes in order to capture high-dimensional relationships hidden in the data. This efficient and flexible strategy holds potential for unveiling complex patterns concealed within massive datasets, but standard implementations overlook several key issues that can undermine research efforts. These issues range from data imputation and discretization to correlation metrics, clustering methods, and validation of results. Here, we enumerate these pitfalls and provide practical strategies for alleviating their negative effects. These guidelines increase prospects for future research endeavors as they reduce type I and type II (false-positive and false-negative) errors and are generally applicable for network modeling applications across diverse domains.
Collapse
Affiliation(s)
- Sharlee Climer
- Department of Computer Science, University of Missouri – St. Louis, St. Louis, MO, USA
| |
Collapse
|
3
|
Jones P, Weighill D, Shah M, Climer S, Schmutz J, Sreedasyam A, Tuskan G, Jacobson D. Network Modeling of Complex Data Sets. Methods Mol Biol 2020; 2096:197-215. [PMID: 32720156 PMCID: PMC7963274 DOI: 10.1007/978-1-0716-0195-2_15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
We demonstrate a selection of network and machine learning techniques useful in the analysis of complex datasets, including 2-way similarity networks, Markov clustering, enrichment statistical networks, FCROS differential analysis, and random forests. We demonstrate each of these techniques on the Populus trichocarpa gene expression atlas.
Collapse
Affiliation(s)
- Piet Jones
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Knoxville Tennessee, Knoxville, TN, USA
| | - Deborah Weighill
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Knoxville Tennessee, Knoxville, TN, USA
| | - Manesh Shah
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | | | - Jeremy Schmutz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Gerald Tuskan
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Knoxville Tennessee, Knoxville, TN, USA
| | - Daniel Jacobson
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Knoxville Tennessee, Knoxville, TN, USA.
| |
Collapse
|
4
|
Weighill D, Tschaplinski TJ, Tuskan GA, Jacobson D. Data Integration in Poplar: 'Omics Layers and Integration Strategies. Front Genet 2019; 10:874. [PMID: 31608114 PMCID: PMC6773870 DOI: 10.3389/fgene.2019.00874] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Accepted: 08/20/2019] [Indexed: 12/20/2022] Open
Abstract
Populus trichocarpa is an important biofuel feedstock that has been the target of extensive research and is emerging as a model organism for plants, especially woody perennials. This research has generated several large ‘omics datasets. However, only few studies in Populus have attempted to integrate various data types. This review will summarize various ‘omics data layers, focusing on their application in Populus species. Subsequently, network and signal processing techniques for the integration and analysis of these data types will be discussed, with particular reference to examples in Populus.
Collapse
Affiliation(s)
- Deborah Weighill
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Timothy J Tschaplinski
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Gerald A Tuskan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Daniel Jacobson
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|
5
|
Weighill D, Macaya-Sanz D, DiFazio SP, Joubert W, Shah M, Schmutz J, Sreedasyam A, Tuskan G, Jacobson D. Wavelet-Based Genomic Signal Processing for Centromere Identification and Hypothesis Generation. Front Genet 2019; 10:487. [PMID: 31214244 PMCID: PMC6554479 DOI: 10.3389/fgene.2019.00487] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 05/06/2019] [Indexed: 12/14/2022] Open
Abstract
Various ‘omics data types have been generated for Populus trichocarpa, each providing a layer of information which can be represented as a density signal across a chromosome. We make use of genome sequence data, variants data across a population as well as methylation data across 10 different tissues, combined with wavelet-based signal processing to perform a comprehensive analysis of the signature of the centromere in these different data signals, and successfully identify putative centromeric regions in P. trichocarpa from these signals. Furthermore, using SNP (single nucleotide polymorphism) correlations across a natural population of P. trichocarpa, we find evidence for the co-evolution of the centromeric histone CENH3 with the sequence of the newly identified centromeric regions, and identify a new CENH3 candidate in P. trichocarpa.
Collapse
Affiliation(s)
- Deborah Weighill
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - David Macaya-Sanz
- Department of Biology, West Virginia University, Morgantown, WV, United States
| | | | - Wayne Joubert
- Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Manesh Shah
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Jeremy Schmutz
- Department of Energy Joint Genome Institute, Walnut Creek, CA, United States.,HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States
| | | | - Gerald Tuskan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Daniel Jacobson
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|
6
|
Lee KY, Leung KS, Tang NLS, Wong MH. Discovering Genetic Factors for psoriasis through exhaustively searching for significant second order SNP-SNP interactions. Sci Rep 2018; 8:15186. [PMID: 30315195 PMCID: PMC6185942 DOI: 10.1038/s41598-018-33493-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 09/28/2018] [Indexed: 12/24/2022] Open
Abstract
In this paper, we aim at discovering genetic factors of psoriasis through searching for statistically significant SNP-SNP interactions exhaustively from two real psoriasis genome-wide association study datasets (phs000019.v1.p1 and phs000982.v1.p1) downloaded from the database of Genotypes and Phenotypes. To deal with the enormous search space, our search algorithm is accelerated with eight biological plausible interaction patterns and a pre-computed look-up table. After our search, we have discovered several SNPs having a stronger association to psoriasis when they are in combination with another SNP and these combinations may be non-linear interactions. Among the top 20 SNP-SNP interactions being found in terms of pairwise p-value and improvement metric value, we have discovered 27 novel potential psoriasis-associated SNPs where most of them are reported to be eQTLs of a number of known psoriasis-associated genes. On the other hand, we have inferred a gene network after selecting the top 10000 SNP-SNP interactions in terms of improvement metric value and we have discovered a novel long distance interaction between XXbac-BPG154L12.4 and RNU6-283P which is not a long distance haplotype and may be a new discovery. Finally, our experiments with the synthetic datasets have shown that our pre-computed look-up table technique can significantly speed up the search process.
Collapse
Affiliation(s)
- Kwan-Yeung Lee
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Hong Kong, China.
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Hong Kong, China
| | - Nelson L S Tang
- Department of Chemical Pathology, the Chinese University of Hong Kong, Hong Kong, China.
| | - Man-Hon Wong
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
7
|
Mellerup E, Andreassen OA, Bennike B, Dam H, Djurovic S, Jorgensen MB, Kessing LV, Koefoed P, Melle I, Mors O, Moeller GL. Combinations of genetic variants associated with bipolar disorder. PLoS One 2017; 12:e0189739. [PMID: 29267373 PMCID: PMC5739413 DOI: 10.1371/journal.pone.0189739] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Accepted: 11/30/2017] [Indexed: 12/02/2022] Open
Abstract
The main objective of the study was to find genetic variants that in combination are significantly associated with bipolar disorder. In previous studies of bipolar disorder, combinations of three and four single nucleotide polymorphisms (SNP) genotypes taken from 803 SNPs were analyzed, and five clusters of combinations were found to be significantly associated with bipolar disorder. In the present study, combinations of ten SNP genotypes taken from the same 803 SNPs were analyzed, and one cluster of combinations was found to be significantly associated with bipolar disorder. Combinations from the new cluster and from the five previous clusters were identified in the genomes of 266 or 44% of the 607 patients in the study whereas none of the 1355 control participants had any of these combinations in their genome.The SNP genotypes in the smaller combinations were the normal homozygote, heterozygote or variant homozygote. In the combinations containing 10 SNP genotypes almost all the genotypes were the normal homozygote. Such a finding may indicate that accumulation in the genome of combinations containing few SNP genotypes may be a risk factor for bipolar disorder when those combinations contain relatively many rare SNP genotypes, whereas combinations need to contain many SNP genotypes to be a risk factor when most of the SNP genotypes are the normal homozygote.
Collapse
Affiliation(s)
- Erling Mellerup
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Copenhagen, Denmark
- * E-mail:
| | - Ole A. Andreassen
- Department of Psychiatry, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Oslo, Norway
| | - Bente Bennike
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Henrik Dam
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Oslo, Norway
| | - Martin Balslev Jorgensen
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Lars Vedel Kessing
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Pernille Koefoed
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Ingrid Melle
- Department of Psychiatry, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Oslo, Norway
| | - Ole Mors
- Centre for Psyciatric Research, Aarhus University Hospital, Skovagervej 2, Risskov, Denmark
| | - Gert Lykke Moeller
- Genokey ApS, ScionDTU, Technical University Denmark, Agern Allé 3, Hoersholm, Denmark
| |
Collapse
|
8
|
Mellerup E, Møller GL. Combinations of Genetic Variants Occurring Exclusively in Patients. Comput Struct Biotechnol J 2017; 15:286-289. [PMID: 28377798 PMCID: PMC5367802 DOI: 10.1016/j.csbj.2017.03.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Revised: 02/26/2017] [Accepted: 03/06/2017] [Indexed: 11/30/2022] Open
Abstract
In studies of polygenic disorders, scanning the genetic variants can be used to identify variant combinations. Combinations that are exclusively found in patients can be separated from those combinations occurring in control persons. Statistical analyses can be performed to determine whether the combinations that occur exclusively among patients are significantly associated with the investigated disorder. This research strategy has been applied in materials from various polygenic disorders, identifying clusters of patient-specific genetic variant combinations that are significant associated with the investigated disorders. Combinations from these clusters are found in the genomes of up to 55% of investigated patients, and are not present in the genomes of any control persons.
Collapse
Affiliation(s)
- Erling Mellerup
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, Faculty of Health, University of Copenhagen, Denmark
| | - Gert Lykke Møller
- Genokey ApS, ScionDTU, Technical University of Denmark, Hoersholm, Denmark
| |
Collapse
|
9
|
Garvin MR, Templin WD, Gharrett AJ, DeCovich N, Kondzela CM, Guyon JR, McPhee MV. Potentially adaptive mitochondrial haplotypes as a tool to identify divergent nuclear loci. Methods Ecol Evol 2016. [DOI: 10.1111/2041-210x.12698] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Michael R. Garvin
- Oregon State University Ringgold Standard Institution ‐ Integrative Biology 3029 Cordley Hall, 2701 SW Campus Way Corvallis OR 97331‐4501 USA
| | - William D. Templin
- Alaska Department of Fish and Game Division of Commercial Fisheries 333 Raspberry Road Anchorage AK 99518 USA
| | - Anthony J. Gharrett
- University of Alaska Fairbanks College Fisheries and Ocean Sciences Juneau AK 99821 USA
| | - Nick DeCovich
- Alaska Department of Fish and Game Division of Commercial Fisheries 333 Raspberry Road Anchorage AK 99518 USA
| | - Christine M. Kondzela
- Auke Bay Laboratories Alaska Fisheries Science Center National Oceanic and Atmospheric Administration National Marine Fisheries Service 17109 Point Lena Loop Road Juneau AK 99801 USA
| | - Jeffrey R. Guyon
- Auke Bay Laboratories Alaska Fisheries Science Center National Oceanic and Atmospheric Administration National Marine Fisheries Service 17109 Point Lena Loop Road Juneau AK 99801 USA
| | - Megan V. McPhee
- University of Alaska Fairbanks College Fisheries and Ocean Sciences Juneau AK 99821 USA
| |
Collapse
|
10
|
Abstract
The well-documented latitudinal clines of genes affecting human skin color presumably arise from the need for protection from intense ultraviolet radiation (UVR) vs. the need to use UVR for vitamin D synthesis. Sampling 751 subjects from a broad range of latitudes and skin colors, we investigated possible multilocus correlated adaptation of skin color genes with the vitamin D receptor gene (VDR), using a vector correlation metric and network method called BlocBuster. We discovered two multilocus networks involving VDR promoter and skin color genes that display strong latitudinal clines as multilocus networks, even though many of their single gene components do not. Considered one by one, the VDR components of these networks show diverse patterns: no cline, a weak declining latitudinal cline outside of Africa, and a strong in- vs. out-of-Africa frequency pattern. We confirmed these results with independent data from HapMap. Standard linkage disequilibrium analyses did not detect these networks. We applied BlocBuster across the entire genome, showing that our networks are significant outliers for interchromosomal disequilibrium that overlap with environmental variation relevant to the genes’ functions. These results suggest that these multilocus correlations most likely arose from a combination of parallel selective responses to a common environmental variable and coadaptation, given the known Mendelian epistasis among VDR and the skin color genes.
Collapse
|
11
|
Climer S, Templeton AR, Zhang W. Human gephyrin is encompassed within giant functional noncoding yin-yang sequences. Nat Commun 2015; 6:6534. [PMID: 25813846 PMCID: PMC4380243 DOI: 10.1038/ncomms7534] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Accepted: 02/06/2015] [Indexed: 12/31/2022] Open
Abstract
Gephyrin is a highly-conserved gene that is vital for the organization of proteins at inhibitory receptors, molybdenum cofactor biosynthesis, and other diverse functions. Its specific function is intricately regulated and its aberrant activities have been observed for a number of human diseases. Here we report a remarkable yin-yang haplotype pattern encompassing gephyrin. Yin-yang haplotypes arise when a stretch of DNA evolves to present two disparate forms that bear differing states for nucleotide variations along their lengths. The gephyrin yin-yang pair consists of 284 divergent nucleotide states and both variants vary drastically from their mutual ancestral haplotype, suggesting rapid evolution. Several independent lines of evidence indicate strong positive selection on the region and suggest these high-frequency haplotypes represent two distinct functional mechanisms. This discovery holds potential to deepen our understanding of variable human-specific regulation of gephyrin while providing clues for rapid evolutionary events and allelic migrations buried within human history.
Collapse
Affiliation(s)
- Sharlee Climer
- Department of Computer Science and Engineering, Washington University, St Louis, Missouri 63130, USA
| | - Alan R Templeton
- 1] Department of Biology, Washington University, St Louis, Missouri 63130, USA [2] Department of Genetics, Washington University, St Louis, Missouri 63110, USA [3] Department of Evolutionary and Environmental Biology, University of Haifa, Haifa 31905, Israel
| | - Weixiong Zhang
- 1] Department of Computer Science and Engineering, Washington University, St Louis, Missouri 63130, USA [2] Department of Genetics, Washington University, St Louis, Missouri 63110, USA [3] Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China
| |
Collapse
|